[Biopython-dev] Rethinking Biopython's testing framework

Tue Dec 30 18:34:45 UTC 2008

On Fri, Nov 28, 2008 at 12:09 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:

> Brad wrote:
>> Agreed with the distinction between the unit tests and the "dump
>> lots of text and compare" approach. I've written both and do think
>> the unit testing/assertion model is more robust since you can go
>> back and actually get some insight into what someone was thinking
>> when they wrote an assertion.
>
> I have probably written more of the "dump lots of text and compare"
> style tests.  I think these have a number of advantages:
> (1) Easier for beginneers to write a test, you can almost take any
> example script and use that.  You don't have to learn the unit test
> framework.

I agree with what you say, but I think that all the 'dump and compare'
tests should be organized in various functions.
This will make easier to use and understand them, and they will be
compatible with the nose framework.

> (2) Debugging a failing test in IDLE is much easier - using unit tests
> you have all that framework between you and the local scope where the
> error happens.

> (3) For many broad tests, manually setting up the expected output for
> an assert is extremely tedious (e.g. parsing sequences and checking
> their checksums).

This is an interesting discussion if you want to talk about it a bit.

An advantage of unittest are the two setUp and tearDown methods (fixtures).
With those, you are sure that all the tests are run with the right
environment and that all variables are dropped before executing a new
test.

Also, if you want to do a lot of dump and compare tests, consider
writing some big doctest scripts.
It will require a bit more of work to write them, but they will be
easier to understand, and they will also become good tutorials for the
users.

This is a tutorial we wrote for a small project not related to biopython:
- http://github.com/cswegger/datamatrix/tree/master/tutorial.txt
As you can see, the text is both a tutorial and a test set (which make
use of a dump and compare approach) for the program.

> We could discuss a modification to run_tests.py so that if there is no
> expected output file output/test_XXX for test_XXX.py we just run
> test_XXX.py and check its return value (I think Michiel had previously
> suggested something like this).

I think this should be done inside the test itself.
All the tests should return only a boolean value (passed or not) and a
description of the error.
The tests that make use of an expected output file, they should open
it and do the comparison by theirselves, not in run_tests.py.

> Perhaps for more robustness, capture
> the output and compare it to a predefined list of regular expressions
> covering the typical outputs.  For example, looking at
> output/test_Cluster, the first line is the test name, but rest follows
> the patten "test_... ok". I imaging only a few output styles exist.

mmm have you changed this file in the cvs recently? I can't find what
you are referring to.

> With such a change, half the unit test's (e.g. test_Cluster.py)
> wouldn't need their output file in CVS (output/test_Cluster).
>
> Michiel de Hoon wrote:
>> If one of the sub-tests fails, Python's unit testing framework will tell us so,
>> though (perhaps) not exactly which sub-test fails. However, that is easy to
>> figure out just by running the individual test script by itself.
>
> That won't always work.  Consider intermittent network problems, or
> tests using random data - in general it really is worthwhile having
> run_tests.py report a little more than just which test_XXX.py module
> failed.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it