[Bioperl-l] Problems parsing scientific name from a Genbank file

Roy Chaudhuri roy.chaudhuri at gmail.com
Fri Jun 19 10:34:24 UTC 2009


Hi Cesar,

I can replicate this using an old Bioperl (version 1.5.2), but it 
appears to be fixed in version 1.6 and bioperl-live - the 
scientific_name method returns "Bacillus anthracis str. Sterne".

Hope this helps.
Roy.

Cesar Arze wrote:
> Hi all,
>    I've searched through the mailing list and bug-tracker looking for any
> indication of this (what I presume to be) bug I have been encountering when
> parsing certain Genbank files using SeqIO::GenBank but have yet to find
> anything. I apologize in advance if this is something that has already been
> addressed.
> 
> When parsing these files and extracting the scientific name it seems that
> line breaks are causing the lineage info found in the ORGANISM section to be
> captured as part of the scientific name. An example of this is accession
> NC_005945:
> 
>   ORGANISM  Bacillus anthracis str. Sterne
>             Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus;
> Bacillus
>             cereus group.
> 
> Bacillus cereus has a line break which then causes scientific name to
> capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus"
> ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes;
> Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name.
> 
> Not sure if anyone has ever ran into this problem but I would very much
> appreciate any help or direction.




More information about the Bioperl-l mailing list