[Bioperl-l] Species name problems with bioperl-db
Roy Chaudhuri
roy at colibase.bham.ac.uk
Thu Jan 25 22:19:00 UTC 2007
Hi.
I'm having problems similar to those discussed in this thread:
http://comments.gmane.org/gmane.comp.lang.perl.bio.general/13766
and in bug 2092.
I'm using the 1.52 release code, that includes Sendu's fix for the
problem, but I'm still getting errors with some species names. The
process seems to fall foul of line 167 of Bio::Species, which checks
that the lineage starts at the species in question.
Here are some of the error messages I'm getting:
Uniprot entry P21215:
MSG: The supplied lineage does not start near 'Clostridium sp.' (I was
supplied 'sp. ATCC29733 | Clostridium | Clostridiaceae | Clostridiales |
Clostridia | Firmicutes | Bacteria')
Uniprot entry Q98AM7:
MSG: The supplied lineage does not start near 'Rhizobium loti' (I was
supplied 'loti | Mesorhizobium | Phyllobacteriaceae | Rhizobiales |
Alphaproteobacteria | Proteobacteria | Bacteria')
Genbank entry CP000026:
MSG: The supplied lineage does not start near 'Salmonella enterica
subsp. enterica serovar Paratyphi A str. ATCC 9150' (I was supplied
'paratyphi | Salmonella | Enterobacteriaceae | Enterobacteriales |
Gammaproteobacteria | Proteobacteria | Bacteria')
It is easy to see why problems are arising- the species name used in the
GenBank/Uniprot entry is sometimes a synonym of that in the supplied
lineage, rather than an exact duplicate. Is the check on line 167 really
necessary? Or at least could the throw be changed to a warn?
Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.
http://xbase.bham.ac.uk
More information about the Bioperl-l
mailing list