[Biopython-dev] "Online" tests, was [Bug 1972]

Thu Mar 30 20:33:55 UTC 2006

On Fri, 2006-03-24 at 14:36 +0000, Peter wrote:
> Bill Barnard wrote:
> > Perhaps another test case similar to test_Registry should be implemented
> > with the purpose of detecting DB format changes. I'll be happy to have a
> > go at it, should you want such a test.

> Peter wrote:
> I think this is an excellent idea - but it would be good to have an 
> opinion from some of the more seasoned BioPython developers.
> 
> Putting these online tests into separate unit test(s) will make tracing 
> unit test failures simply due to being offline much easier.
> 
> I could probably help out with some of the formats - but I am by no 
> means familiar with them all.

I've made a first cut unit test, tentatively named
test_Parsers_for_newest_formats, which retrieves and parses some small
records for Prosite, Prodoc, SwissProt, and Medline records. I tried
these types first, based on a quick search of the code tree to see where
there was existing code that makes use of Bio.WWW.

[billb at tioga Bio]$ find . -name "*.py" | xargs grep "Bio\.WWW"

yields (in part)

./Prosite/Prodoc.py:from Bio.WWW import ExPASy
./Prosite/__init__.py:from Bio.WWW import ExPASy
./SwissProt/SProt.py:from Bio.WWW import ExPASy
./PubMed.py:from Bio.WWW import NCBI

./Blast/NCBIWWW.py:    from Bio.WWW import NCBI

The first four are easy to test with code like:

class ExpasyTest(unittest.TestCase):
    """Test that Expasy parsers can read the current database formats
    """
    def setUp(self):
        self.prosite_dict = Prosite.ExPASyDictionary \
                            (parser=Prosite.RecordParser())

    def t_read_record(self):
        """Retrieve a Prosite record and parse it
        """
        accession = 'PS00159'
        entry = self.prosite_dict[accession]
        self.assertEqual(entry.accession, accession)

Testing Blast in the same way doesn't seem sensible to me, and it looks
as though any effort there should be in the XML Parser area, rather than
in the thankless task of parsing HTML. (I suspect that's what you've
already decided.)

> In some cases (e.g. GenBank, Fasta) once the sample file is downloaded 
> there are multiple parsers to be checked (e.g. record and feature parsers).

I'll take a look at more parsers, as I figure out where they are. I will
take the same approach of looking through the code tree for existing
parsers using find/grep. It looks as though there are a fair number
which may be obsolete. I would appreciate any guidance in figuring out
which ones would be most useful to check.

(Is this exercise useful? I was just learning my way around the code
using the on-line course at the Pasteur Institute, and found a minor bug
which I fixed. Since any bug should really be covered by a test as well
as being fixed, I wanted to now add the test. I like cleaning up
problems as I find them, but I may not be doing anything that's of more
than minor utility for Biopython...)

> We should probably produce a streamlined test output file WITHOUT 
> details which are likely to change in later versions of the test file 
> e.g. revisions to genbank files.

Since the test only verifies the record can be retrieved, parsed, and is
the actual record requested it emits very little output. My last run
emitted:

[billb at tioga Tests]$ python test_Parsers_for_newest_formats.py
Retrieve a Prodoc record and parse it ... ok
Retrieve a Prosite record and parse it ... ok
Retrieve a SwissProt record and parse it into Record format ... WARNING
- Ignoring line: DT   20-DEC-2005, integrated into UniProtKB/Swiss-Prot.

WARNING - Ignoring line: DT   07-DEC-2004, sequence version 1.

WARNING - Ignoring line: DT   07-FEB-2006, entry version 10.

ok
Retrieve a SwissProt record and parse it into Sequence format ... ok
Retrieve a PubMed record and parse it ... ok

----------------------------------------------------------------------
Ran 5 tests in 3.085s

OK

Is this what you mean by "streamlined test output"?

> One question is should the test "cache" any downloaded files (say for a 
> day) which would be helpful for anyone trying to debug a particular 
> issue and re-running the online tests?  Or is this just making life too 
> complicated.

This could be done, but I doubt I would do it unless it really seemed
useful...

Lazily yours,

Bill