[Bioperl-l] question about Bio::DB::GenBank

Alan Robinson alan@ebi.ac.uk
Wed, 13 Jun 2001 11:40:19 +0100 (GMT Daylight Time)


Second pass on this (which doesn't solve your immediate problem, but
provides the reason why and perhaps an issue in Bio::DB::Embl):

You're seeing the effect of the different query engines at EBI and GenBank
and how they deal with a long-standing agreement between DDBJ, EBI &
GenBank that sequences in the official databases should not be longer than
350kB.

The accession AE004439 is an accession number used for the publication
which is attached to the chopped-up sections of the genome that were
submitted. The sequence itself is too long to exist as a single entry in
the official GenBank, EMBL and DDBJ databases.

However EBI/DDBJ interpret differently from GenBank what you get if you
ask for the accession of such a complete genome via our search engines: 

 - If you use Entrez to access this accession, NCBI hands you an "illegal"
   superlong sequence entry. But this is what a reasonable person might
   expect.

 - If you use emblfetch or SRS at the EBI (or getentry at DDBJ), we hand
   you a list of the "legal" sections that make up this entry in the
   official database (we provide access to "illegal" long sequences in our
   complete genomes section: http://www.ebi.ac.uk/genomes/, e.g.
ftp://ftp.ebi.ac.uk/pub/databases/embl/genomes/Bacteria/pmultocida/AE004439.embl
).


Btw, if you look at the output of emblfetch as used by bioperl, then it
provides the 204 sections of the complete genome. I'll guess that the
Embl.pm module isn't reading past the first entry?

The EMBL database team are now re-assessing what might be the right
policy (any feedback would be appreciated).

--
============================================================
Alan J. Robinson, D.Phil.             Tel:+44-(0)1223 494444
European Bioinformatics Institute     Fax:+44-(0)1223 494468
EMBL Outstation - Hinxton             Email:  alan@ebi.ac.uk
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, UK                http://industry.ebi.ac.uk/~alan/
============================================================