[Bioperl-l] EUtilities interface

Chris Fields cjfields at uiuc.edu
Wed Jun 21 21:16:38 UTC 2006


> -----Original Message-----
> From: Sendu Bala [mailto:bix at sendu.me.uk]
> Sent: Wednesday, June 21, 2006 1:23 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] EUtilities interface
> 
> Chris Fields wrote:
> > I'm working on a new eutilities interface which I hope to commit by late
> > summer.  It's basically a rewrite of WebDBSeqI/NCBIHelper.  I set up a
> > generic web database interface, which I call Bio::DB::WebDBI, and the
> > EUtilities interface, Bio::DB::EUtilitiesI.  The idea is that you can
> query
> > NCBI for any information available via Entrez Utilities (i.e. taxonomy,
> > pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-
> only
> > info like Bio::DB::WebDBSeqI.
> >
> > My only concern is confusion over names, particularly WebDBI vs.
> WebDBSeqI.
> > Does anyone think this will be an issue?
> 
> Well, I don't. Sounds good to me. What's the intended relationship
> between WebDBI and EUtilitiesI? Would your work end up in the removal of
> direct XML parsing from Bio::DB::Taxonomy::entrez? Or would it just
> convert the code that gets the XML to a one line statement or so?

Well, right now all it does is use URI to build queries, submit them to
Entrez Utilities, then grab the response; I've been hacking at it on and off
for a few months now.  It needs some error handling and added methods
(mainly for proxies and handling WebEnv/query_key), though once I have it in
decent enough shape I'll go ahead and add it to CVS.  

Theoretically once the response is returned it can be parsed like any stream
(see WebDBSeqI/NCBIHelper for an idea of how sequences are parsed and
returned using SeqIO).  This should work as long as there is an appropriate
class to handle the data stream and the appropriate 'plugin' to parse the
data into objects; i.e. dbSNP can be handled by ClusterIO::dbSNP, sequences
by SeqIO::genbank/fasta, pubmed by Bio::Biblio::IO::pubmedxml, and so on.
If you don't have an object or want the raw data stream, you could submit a
request using the various eutility (efetch, epost, esearch) and save as raw
format to an output file or STDOUT.  

Here's a rough diagram:

                      |------------------->Bio::DB::DBFetch (EBI
interface)----->plugins for Bio* classes
Bio::Root::Root       |
LWP::UserAgent ------Bio::DB::WebDBI------>Bio::DB::EUtilitiesI (NCBI
interface)----->plugins for Bio* classes
                      |
                      |------------------->others?

You probably don't need a Bio::*IO::plugin for each type; tax data in
Bioperl seems to primarily utilizes the NCBI Tax database, so
Bio::DB::Taxonomy::entrez shouldn't be too hard to adapt to act as a plugin.
Bio::DB::Taxonomy::entrez uses XML::Twig to parse everything into
Bio::Taxonomy::Node objects and is able to retrieve single and multiple ID's
using the same method, though I would probably use XML::SAX instead.  If I
remember correctly there were issues with Bio::DB::Taxonomy that you brought
up...

Chris






More information about the Bioperl-l mailing list