[Bioperl-l] t/SeqIO.t -- improvements?
Sendu Bala
bix at sendu.me.uk
Thu Sep 7 20:08:53 UTC 2006
Hilmar Lapp wrote:
> On Sep 7, 2006, at 1:37 PM, Jay Hannah wrote:
>
>> Wouldn't it be better if:
>> (1) fasta.out was generated to contain *all* sequences, not just
>> the first.
>
> Possibly.
>
>> (2) a test was added to verify that fasta.out exactly matches
>> test.fasta (diff is blank).
>
> No. The goal is not exact reproduction (you'd use cp for that) but
> writing out a file that is valid FASTA format and contains the same
> information as the input file.
Round-trip tests would be extremely valuable and would be /very/ much
appreciated. The lack of any have left some large bugs (eg. in taxonomy
parsing) completely unnoticed/unfixed for years.
Don't just do a simple diff on the output file since differences may not
indicate errors (Hilmar's point). Instead read the output file in again
and make sure the resulting object (and any any objects they contain)
contains all the same information as the object generated when reading
the original input file.
Ideally the output file would also be checked independently of the
Bioperl parser being tested, but that may be only possible in a limited
way (otherwise you'd end up writing a whole new parser...). But eg. if a
file format specifies that there is a maximum line width, at least check
that the output file has no lines longer than that. (Again, a real
problem, and you'll almost certainly discover some bugs related to this
if you write the tests.)
So if you have the time, please add tests and attach them to specific
new bug reports if your tests reveal bugs
(http://bugzilla.open-bio.org/), or just email your patch(s) direct to me.
Cheers,
Sendu.
More information about the Bioperl-l
mailing list