[BioSQL-l] Re: [Bioperl-l] Re: GO dbxrefs in swissprot

Ewan Birney birney at ebi.ac.uk
Tue Jul 6 15:43:07 EDT 2004



> > Is there a full list of parseable databases (GenBank, EMBL, ENSEMBL?,
> > PDB? etc) and the resp. place to download?
>

Ensembl is best accessed through the Ensembl Perl API, parts of which
still do comply to the Bioperl Bio::SeqI interface (ie, they can be dumped
by SeqIO, and therefore in theory read into the BioSQL). Ensembl does make
EMBL dumps *BUT*....


... all we now put in the EMBL dumps are the genes. It is bad enough
trying to keep everything tied down in place inside the Ensembl system
correctly to also be agonising about how data should be represented inside
EMBL/GenBank flat files (or Bio::SeqI objects more clearly) -- and we
clearly can't dump all the SNPs, Features, Genes, Exon, Affy probe
mappings, etc etc on our ftp site. We'd simply run out of space by
feburary each year.


A low priority project inside Ensembl has been to set up a more functional
ensembl<->bioperl bridge that would give good access to Ensembl objects
through a Bio::SeqI wrapper, presumably using the AnnotationI interface to
its absolute max. This is in the "would be nice to do" but we always have
things far higher on the priority stack (eg, this month's fun was dealing
with selenocystines).


For more info on the ensembl perl API check out:


http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/CodeTutorial.html







More information about the BioSQL-l mailing list