[Bioperl-l] EMBL ID line parsing error
Heikki Lehvaslaiho
heikki at ebi.ac.uk
Wed Jul 13 09:06:24 EDT 2005
I noticed that one BioFetch test was failing. It was caused by an EMBL entry
object not having a display ID. The failure was caused by regular expression
in the EMBL parser not allowing spaces in the molecule substring of the ID
line:
ID BUM standard; genomic RNA; VRL; 200 BP.
was: (\S+);
fix: ([\S ]+); now in bioperl-live
The affected Bio::Seq::RichSeq methods are:
display_id(), id(), molecule(), division()
Here is a breakdown of all molecule values in current EMBL release:
circular genomic dna 7427
circular genomic rna 687
circular mrna 23
circular other dna 915
circular other rna 9
circular trna 1
circular unassigned dna 266
circular unassigned rna 2
genomic dna 14573961
genomic rna 152219
mrna 28138477
other dna 6956
other rna 1827
pre-rna 898
rrna 5999
scrna 95
snorna 981
snrna 455
trna 667
unassigned dna 1941868
unassigned rna 102162
One third of the EMBL entries are affected.
This error does not affect GenBank entries which use different syntax.
I wonder how long this error has been there!
-Heikki
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambridge, CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list