[Bioperl-l] t/SeqIO.t -- improvements?

Sendu Bala bix at sendu.me.uk
Thu Sep 7 20:08:53 UTC 2006


Hilmar Lapp wrote:
> On Sep 7, 2006, at 1:37 PM, Jay Hannah wrote:
> 
>> Wouldn't it be better if:
>> (1) fasta.out was generated to contain *all* sequences, not just  
>> the first.
> 
> Possibly.
> 
>> (2) a test was added to verify that fasta.out exactly matches  
>> test.fasta (diff is blank).
> 
> No. The goal is not exact reproduction (you'd use cp for that) but  
> writing out a file that is valid FASTA format and contains the same  
> information as the input file.

Round-trip tests would be extremely valuable and would be /very/ much 
appreciated. The lack of any have left some large bugs (eg. in taxonomy 
parsing) completely unnoticed/unfixed for years.

Don't just do a simple diff on the output file since differences may not 
indicate errors (Hilmar's point). Instead read the output file in again 
and make sure the resulting object (and any any objects they contain) 
contains all the same information as the object generated when reading 
the original input file.

Ideally the output file would also be checked independently of the 
Bioperl parser being tested, but that may be only possible in a limited 
way (otherwise you'd end up writing a whole new parser...). But eg. if a 
file format specifies that there is a maximum line width, at least check 
that the output file has no lines longer than that. (Again, a real 
problem, and you'll almost certainly discover some bugs related to this 
if you write the tests.)


So if you have the time, please add tests and attach them to specific 
new bug reports if your tests reveal bugs 
(http://bugzilla.open-bio.org/), or just email your patch(s) direct to me.

Cheers,
Sendu.



More information about the Bioperl-l mailing list