[Bioperl-l] EMBL ID line parsing error

Heikki Lehvaslaiho heikki at ebi.ac.uk
Wed Jul 13 09:06:24 EDT 2005


I noticed that one BioFetch test was failing. It was caused by an EMBL entry 
object not having a display ID. The failure was caused by regular expression 
in the EMBL parser not allowing spaces in the molecule substring of the ID 
line:


ID   BUM        standard; genomic RNA; VRL; 200 BP.
                   was:   (\S+);
                   fix:   ([\S ]+);     now in bioperl-live


The affected Bio::Seq::RichSeq methods are:
 display_id(), id(), molecule(), division()

Here is a breakdown of all molecule values in current EMBL release:

circular genomic dna     7427
circular genomic rna      687
circular mrna              23
circular other dna        915
circular other rna          9
circular trna               1
circular unassigned dna   266
circular unassigned rna     2
genomic dna          14573961
genomic rna            152219
mrna                 28138477
other dna                6956
other rna                1827
pre-rna                   898
rrna                     5999
scrna                      95
snorna                    981
snrna                     455
trna                      667
unassigned dna        1941868
unassigned rna         102162


One third of the EMBL entries are affected.

This error does not affect GenBank entries which use different syntax.

I wonder how long this error has been there!


 -Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________


More information about the Bioperl-l mailing list