Bioperl: Bug in EMBL parser: expects FH/FT lines

Ewan Birney birney@ebi.ac.uk
Thu, 15 Jun 2000 12:13:16 +0100 (GMT)


On Thu, 15 Jun 2000, Peter van Heusden wrote:

> Hi All
> 
> There is a bug in the way the Bio::SeqIO::embl module parses EMBL, and the
> way it writes EMBL entries.
> 
> As I read the EMBL manual, 3.3 Structure of an Entry
> (http://www.ebi.ac.uk/embl/Documentation/User_manual/structure_entry.html)
> there is a defined order of particular lines, but there is no requirement
> that all types of lines are present. In particular, there may be 
> 0 or more FH/FT lines. 
> 
> Unfortunately, the embl.pm next_seq() function explicitely expects ID,
> then various optional things, then FH, then FT, then SQ. This means that
> an EMBL entry without FT lines is not only ignored, but causes
> next_seq() to read to the end of the file, discarding all lines along the
> way. A rather major bug, in my opinion.
> 
> Secondly, and the reason I discovered this, the embl.pm
> write_seq() function always writes the FH lines, and only writes FT lines
> if there are features present. Surely it should check to see if there are
> features before writing FH lines?
> 
> In some of our work here, we converted a set of Fasta entries to EMBL,
> using Bio::SeqIO, and then (some time later) tried to convert the
> resulting EMBL entries to Fasta, at which point the above behaviour was 
> discovered.
> 
> If everyone agrees with my understanding of how things should work, I
> can submit patches...

You are bang on Peter. Feel free to submit patches and/or commit directly
to cvs. 

If you do it on cvs, commit on both the branch-06 (checkout with -r
branch-06) and the main trunk. You can use cvs update -j branch-06 embl.pm
if you like (quick way of moving fixes from 06 branch to main trunk)



> 
> Peter
> --
> Peter van Heusden				pvh@egenetics.com
> Electric Genetics
> 
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================