[Bioperl-l] Re: GO dbxrefs in swissprot
Hilmar Lapp
hlapp at gnf.org
Tue Jul 6 16:49:22 EDT 2004
Hi Ewan. how are you? :-)
On Jul 6, 2004, at 12:43 PM, Ewan Birney wrote:
> Ensembl is best accessed through the Ensembl Perl API
Bluntly, this is - although less extreme - like NCBI saying RefSeq is
best accessed through the NCBI toolkit, and here's how you install
that, and by the way we don't have time to create a genbank-formatted
dump.
I.e., I believe immediately that if I wanted to get every detail and
every context of the genome annotation that Ensembl produces right up
to every special case, then I shouldn't go for anything less than the
full power of the Ensembl Perl API.
Many times though the "best" access is the least troublesome, or most
familiar, with some loss of content acknowledged. I'm willing to bet
that most people access RefSeq not through the NCBI toolkit, and that
that wouldn't change even if there were some content that would be
absent in the genbank-formatted dump.
Do you foremost want to do a service to the community, or a service to
your development group?
What would be extremely useful is if Ensembl provided a dump in a
common flat-file format that contained all Ensembl-originated content
that one cannot reproduce without a very significant computing and
maintenance effort. As I see it, this would consist of all gene
predictions, transcript predictions, protein predictions, and the
results of the Ensembl annotation pipeline(s) for those predictions;
localizations would be nice, but not required. It doesn't have to be
EMBL format; any flat format that Bio::SeqIO supports and that doesn't
require me to install yet another huge library the update cycle of
which I need to keep up with would be very helpful.
(Actually, would the gene-only dump you mentioned have all that as
features and tags?)
IMNSHO this wouldn't be a nice-to-have; it would be terrific and
tremendously increase the value of Ensembl once you're outside of the
Ensembl website. It would also allow people (read: me ;) to, e.g.,
effortlessly load ensembl along with refseq and swissprot into biosql.
Affy probe and any other public sequence mappings I can do myself given
the genome sequence and my own BLAT server (besides, even without one,
UCSC provides all of that for download anyway).
Anyway, my $0.02, which turns out to approach being worth less than a
GBP penny ...
Beer in Glasgow? Meanwhile I could even convince my credit card company
not to shut down my account and that Concorde Services is not a
fraudulent UK male entertainment enterprise :-)
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list