[Bioperl-l] Test related Suggestions

Thu Jul 5 15:54:42 UTC 2007

On Jul 5, 2007, at 9:58 AM, Nathan S. Haigh wrote:

> Quoting Chris Fields <cjfields at uiuc.edu>:
>
>>
>> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:
>>
>>>
>>> One more suggestion:
>>>
>>> It would be extemaly useful if we had a standard way of testing
>>> that a when a
>>> file is read into a bioperl object and then written out again into
>>> a same
>>> format, the input and output files are identical. If not, the test
>>> should
>>> show where the the differences start (showing all the differences
>>> would just
>>> clutter the screen).
>>>
>>> This standard method/subroutine should be used to test all sequence
>>> and other
>>> text file IO.
>>>
>>> Any takers?
>>>
>>> 	-Heikki
>> ...
>>
>> I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t
>> that do some checking, I think, but something like this would be of
>> use.  However, what if the test file is old (as many in t/data are)
>> and the format has changed?  GenBank and EMBL, for instance, have
>> gone through several changes to format.
>>
>> chris
>>
>>
>
> Is there any way to distinguish variants apart other than just  
> layout? e.g. a version number of the likes?
>
> Nath

I don't think so; this veers back into the whole validation issue  
(i.e. does the record fit certain specifications).  There are  
examples of seq records from different sources which bioperl is  
expected to parse, for example Ensembl GenBank records.  Some of  
those have feature tags or annotation fields which may not appear in  
output when using write_seq().

I don't think it's as important to replicate the output data exactly  
like the input as much as it's important to have the data represented  
in a Bio::Seq object (or any other Bio* instance) in a consistent  
manner and have the ability to incorporate new fields (such as the  
recent addition of genome projects) transparently.  The latter is  
hard to do with the current genbank parser (you have to specifically  
code for it), but it is a bit easier to do with the driver-handler  
model I'm working on.

chris