Perl, Python, and Ruby: Extended test assertions and diagnostics

(I am part two of a series)

To extend and illustrate further the design and architecture of each language’s testing tools, the next topic is how to add customized testing assertions on top of the existing testing assertions.

This section will start relatively straightforward task of checking an associative array for the presence of a given key – with diagnostics returned to the user on failure – but will then complicate that task by examining how one would go about generalizing the extended assertion to work across different testing libraries.

Perl

Perl – like Python and Ruby – has a built-in for checking if a key exists in an hash (Perl-ese for an associative array), which returns a true or false value: exists $hash{$key}. Perl’s basic testing assertion Test::More::ok(), accepts a boolean input, so the two can be simply combined:

ok( exists $hash{$key}, "Key $key exists in hash" );

It would be more useful to show diagnostic information to whomever is investigating the failed testing assertion. As discussed previously, Perl’s testing assertions are more properly testing predicates – they return a truth value, and we can use that to decide whether to print diagnostics using Test::More::diag():

ok( exists $hash{$key}, "Key $key exists in hash" ) ||
    diag("Hash contained keys: " . join ', ', sort keys %hash );

This prints the more helpful TAP output:

not ok 1 - Key Waldo exists in hash
#   Failed test 'Key Waldo exists in hash'
#   at waldo.t line 16.
# Hash contained keys: Bar, Baz, Foo

Where diagnostics are separated from test results using the # that Perl also uses for comments.

This is easily packaged up in to a reusable function:

sub test_hash_has_key {
    my ( $class, $hash, $key ) = @_;
    
    return ok( exists $hash->{$key}, "Key $key exists in hash" ) ||
        diag("Hash contained keys: " . join ', ', sort keys %$hash );        
}

However, the use of the Test::More function ok() adds a layer of unneeded indirection; it’s possible instead to talk directly to the Test::Builder singleton that Test::More and virtually every other Perl testing library uses under the hood:

use base 'Test::Builder::Module';

sub test_hash_has_key {
    my ( $class, $hash, $key ) = @_;
    
    # Get the Test::Builder singleton
    my $builder = $class->builder;

    # Run the test, and save its pass/fail state in $status
    my $status = $builder->ok(
        exists $hash->{$key},
        "Key $key exists in hash"
    );
    
    # Print diagnostics if it failed
    unless ( $status ) {
        $builder->diag(
            "Hash contained keys: " . join ', ', sort keys %$hash );
    }

    # Pass back the test/fail status to the caller
    return $status;
}

The result is very simple code, but also very useful: it can be used almost anywhere in the Perl testing ecosystem. It reports the status of the testing assertion on both success and failure, and adds diagnostics on failure, in a way that will integration with no additional work in to test suites built with:

Test::Class, Perl’s xUnit work-alike
Test::Expectation, Perl’s RSpec clone
Test::WWW::Mechanize,
Test::BDD::Cucumber, Perl’s Cucumber port (which we’ll be looking at later)
And almost without exception, every other weird and wonderful Perl testing library

Python

(HEY! This section only exists in the blog format of this piece. The next section illustrates a bunch of “problems” in Python testing that aren’t problems at all in the real world. The tooling stops them from being issues. However, the way the problems are handled by the tooling is important and interesting, so this nitpicks the problems to set up discussion of the solution.)

Python has a simple and particularly readable structure for testing for key membership of a dict (Python-esque for an associative array). So one could signal failure of a testing assertion in Python in a portable way that will be caught and understood by the majority of testing libraries; simply raise an AssertionError:

if key not in d:
    raise AssertionError("Key %s does not exist in dict" % repr(key) )

To add diagnostic information is slightly more complicated, as there’s no standard way to do it across different libraries (or indeed even with most libraries). One can can manually write to STDERR, and hope for the best:

if key not in d:
    keydesc = ", ".join(d.keys())
    sys.stderr.write("Dict contained keys: %s" % keydesc)
    raise AssertionError("Key %s does not exist in dict" % repr(key) )

Text to STDERR is explicitly summarized when the test is run using PyTest, but otherwise has nothing to distinguish it as relating to the test rather than any other expected diagnostic output. Instead, the same information can be used as the message in the AssertionError itself; if an AssertionError is being raised, something has gone wrong and diagnostics are appropriate:

if key not in d:
    keydesc = ", ".join(d.keys())
    raise AssertionError(
        "Key %s does not exist in dict; extant keys: %s"%
        (repr(key), keydesc) )

Two problems are obvious however:

Firstly, explicitly raising an AssertionError means that no actual testing assertion is being exercised – there is no positive assertion path. The parent library is unable to detect that a testing assertion has occured, and so can’t keep statistics (such as counting the number of assertions run or printing descriptions of successful assertions), can’t make use of any special behaviour encoded in its testing assertions (such as coverage tracing), nor can it assign benchmarking results to given assertions.

This first problem can be solved by moving the predicate itself (key not in d) in to a library-provided testing assertion. But we have to do this separately and differently for testing library we want to integrate with:

# unittest
self.assertTrue( key in d, "Key %s exists in dict" % repr(key) )

# PyTest
assert key in d, "Key %s exists in dict" % repr(key)

The second problem concerns having conflated diagnostic information and the assertion identifier. The earlier Wikipedia-derived definition for a testing assertion – “an assertion, along with some form of identifier” – hints that it would be useful to identify and track assertions. Overloading the identifier to include diagnostics isn’t just philosophically ugly by its mixing of concerns, it hampers the ability to reliable identify an assertion – perhaps the continouous deployment tool running the tests keeps statistics on assertions that frequently fail or benchmarks the amount of time taken to get to an assertion over time.

One could attempt to solve both problems by relying on the knowledge that testing assertions from all libraries will raise the same kind of catchable exception. Run the appropriate testing assertion, and intercept the raised error to add diagnostics:

try:
    msg = "Key %s exists in dict" % repr(key)
    if ( 'unittest_object' in vars() ):
        unittest_object.assertTrue( key in d, msg )
    else:
        assert key in d, msg
except AssertionError:
    keydesc = ", ".join(d.keys())
    sys.stderr.write("Dict contained keys: %s" % keydesc)
    raise
except:
    raise

The lack of a specific diagnostic system means there’s always an unpalatable choice between pushing diagnostics as unstructured text to the potentially noisy STDERR, or overloading the testing assertion identifier, neither of which is ideal.

Interestingly, xUnit-based testing libraries (like unittest) work around both issues by making the smallest-reportable unit the TestCase (Hamill 2004) – a named block that can potentially contain several testing assertions – rather than the actual testing assertions they provide. The testing assertions would be collated in to one named meta-assertion:

def test_waldo_found(self):
    d = self.fixture_dict

    # A real unittest assertion
    self.assertTrue( len(d) > 0, "dict has some items" )

    # Manually raising a failure
    if "Waldo" not in d:
        keydesc = ", ".join(d.keys())
        raise AssertionError(
            "Key 'Waldo' does not exist in dict; extant keys: %s" %
             keydesc )

If the method representing the TestCase is treated as a meta testing assertion for reporting – rather than recording the status of the testing assertions it’s made up of – then a positive testing assertion path is regained (the TestCase was run and not failed) as is a testing assertion identity separate from diagnostics information (the method name of the TestCase vs the message in the the raised AssertionError).

Essentially, if:

One trusts that the person using the testing assertion is treating it as a small part of a bigger meta testing assertion; and
The bigger meta-testing assertion has a stable and high-quality identifier; and
The bigger meta-testing assertion not failing is recorded and treated as an assertive success

Then one can fall back to the already-seen solution of explictly raising a AssertionError and overloading its test name with diagnostics:

if key not in d:
    keydesc = ", ".join(d.keys())
    raise AssertionError(
        "Key %s does not exist in dict; extant keys: %s"%
        (repr(key), keydesc) )

Python presents a unified mechanism for representing test assertion failure,¹⁶ but there is no unified mechanism for representing assertion success, and thus no mechanism for specifying more generally that a testing assertion took place. Lack of a specific diagnostic channel for assertions to use mean the author of an extended diagnostic testing assertion will need to think carefully about how to provide this information.

In practice though, the majority of Python testing is done using unittest (or tools based on it) – which uses the TestCase pattern above – or via PyTest¹⁷ – whose default is to look for testing classes with test_ methods, and thus also implement the TestCase pattern. Assuming one sticks to tools using this pattern, the pitfalls and distinctions above are – literally – academic only, and only of interest to those trying to understand the implementation details.

Ruby

Unlike either Perl or Python, Ruby’s testing tools have not coalesced around any shared conventions. The approach of wrong has been examined in a previous section – a library with adaptors that allow a given test assertion function to raise an exception of the appropriate type for the testing library been used.¹⁸

To compare Ruby to Perl and Python via developing a specific extended diagnostic assertion then seems unrewarding. However, both wrong and Test::Unit have interesting takes on how their assertion diagnostics are raised, so this section will look at them in more detail.

Specifically un-mentioned here are minitest, which takes the same approach as Python’s unittest, and RSpec, a Behaviour Driven Development tool which will be looked at in a little more detail at the same time as Cucumber.

wrong

Every other testing assertion library looked at so far provides a method for asserting truth, a method for asserting equality with some diagnostic capabilities, and a set of other extended diagnostic testing assertions.

wrong provides only a single method – assert {block} – which accepts a block of code expected to be a predicate expression. When the block evaluates true, the code moves on. When the block evaluates false, a more in-depth process is kicked off.

The assert method determines which file on the filesystem it’s in, and what line number, and that file is then opened and the block is located and statically parsed¹⁹. The predicates expression in the block is then split in to sub-expressions (if they exist), and the boolean value of each is shown. For example, and from the documentation:

x = 7; y = 10; assert { x == 7 && y == 11 }
 ==>
Expected ((x == 7) and (y == 11)), but
    (x == 7) is true
    x is 7
    (y == 11) is false
    y is 10

wrong‘s documentation explicitly discourages adding identifier names to testing assertions created with assert on the basis that the predicate itself should be sufficient documentation: “if your assertion code isn’t self-explanatory, then that’s a hint that you might need to do some refactoring until it is”. In the example above, x == 7 && y == 11 is expected to act both as the identifier and the assertion; see also Literate Programming²⁰.

On failure, and in raising its own exception class, wrong merges the stringified predicate that acts as an identifier in to its diagnostics of failure. This approach extends to its design of adaptors for other exception classes too. While Test::Unit‘s exception class (examined next) supports a distinction between these, wrong assumes that all exception classes it has been adapted to raise will also simply use a string containing both diagnostics and assertion identifier.

Test::Unit

Test::Unit is an occasionally-bundled-with-Ruby xUnit-derivative, which provides an assert_equal() testing assertion. Like the other xUnit descendants (like unittest above), it requires testing assertions to be used in named test case blocks, which it uses to identify tests.

By default:

def test_simple
  actual = "Bar"
  assert_equal("Foo", actual, "It's a Foo" )
end

Will die very colourfully, but will interestingly not conflate diagnostics and identifiers:

Failure: test_simple(TUSimple)
TU_Simple.rb:8:in `test_simple'
      5: 
      6:   def test_simple
      7:     actual = "Bar"
  =>  8:     assert_equal("Foo", actual, "It's a Foo" )
      9:   end
     10: 
     11: end
It's a Foo
<"Foo"> expected but was
<"Bar">

The enclosing test case is what’s marked as failed (Failure: test_simple(TUSimple)), and the testing assertion’s name is presented separately (It's a Foo) to the diagnostic message (<"Foo"> expected but was <"Bar">) and stack trace.

Indeed, Test::Unit raises exceptions of the class Test::Unit::AssertionFailedError, which has explicit atributes supporting an expected value, an actual value, and a message, separately²¹.

This seems like a best of both worlds approach – passing the diagnostic information back to the test harness distinct from both the testing assertion name, and distinct from the wider containing test name. Test::Unit – via plugins – is able to support output from its tests that make use of this distinction, including a TAP output module.

Summary

Perl’s testing tools have a fundamental philosophical difference from those of Python and Ruby: failed testing assertions are not raised as exceptions (and as mentioned right at the start, are not then true assertions).

Both approaches seem to have advantages and disadvantages.

Perl’s approach means that a failed test assertion doesn’t derail other nearby assertions from being run – one can run a lengthy series of assertions, and a single failing one near the beginning won’t stop further potentially useful diagnostics from being generated. However, a failing assertion near the beginning of a test run may invalidate the results of all other assertions, rendering those diagnostics useless.

No pressure is placed on the developer generating their tests to organize in to small, named testing units, and – entirely anecdotally – this can often leads to tests in Perl being written in a long, meandering style that mixes testing assertions directly in to fixture code with no clear separation.

Python and Ruby’s approach of using raised exceptions – and thus “genuine” testing assertions – places pressure on the developer to avoid long sequeneces of assertions, and hew close to the xUnit ideal that “a test method should only contain a single test assertion”(Hamill 2004).

Of the Python and Ruby libraries touched upon, only Test::Unit takes advantage of the fact that both language have structured exception objects to add in useful diagnostic information to the raised exceptions. The combination of the pressure to organize tests in to named blocks, and the inability to see the results of testing assertions occuring directly after failed ones, has meant that these enclosing blocks are considered to be the tests run, not the individual assertions. In that context, these blocks are the smallest unit identified by test harnesses, and the messages contained in raised exceptions are seen solely as diagnostic information.

Ascher, David, and Mark Lutz. 1999. “Functions > Scope Rules in Functions.” In Learning Python. O’Reilly. https://www.safaribooksonline.com/library/view/learning-python/1565924649/ch04s03.html.

Beck, Kent. 1994. “Simple Smalltalk Testing: With Patterns.” The Smalltalk Report 4 (2): 16–18.

Flanagan, David, and Yukihiro Matsumoto. 2008. “Classes and Modules > Singleton Methods and the Eigenclass.” In The Ruby Programming Language. O’Reilly Media, Inc. https://www.safaribooksonline.com/library/view/the-ruby-programming/9780596516178/ch07s07.html.

Goldstine, Herman Heine, and John Von Neumann. 1948. Planning and Coding of Problems for an Electronic Computing Instrument. Institute for Advanced Study. https://library.ias.edu/files/pdfs/ecp/planningcodingof0103inst.pdf.

Hamill, Paul. 2004. Unit Test Frameworks: Tools for High-Quality Software Development. O’Reilly Media, Inc.

“Test Anything Protocol.” https://testanything.org/.

Turing, A. 1949. “In Report of a Conference on High Speed Automatic Calculating Machines.” In Univ. Math. Laboratory, Cambridge, 67–69. http://www.turingarchive.org/viewer/?id=462&title=01.