[Bioperl-l] Database Retrieval

Sean Davis sdavis2 at mail.nih.gov
Tue Aug 8 13:09:35 UTC 2006




On 8/8/06 8:49 AM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> Most of the Bio::DB::* classes implement Bio::DB::RandomAccessI,
> which is the origin of the get_Seq_by* methods that Bio::DB::GenBank
> and others use.  You could create a set of modules which implements
> an interface like RandomAccessI, grab the raw data on the backend
> using a UCSC-specific DB handle (using MySQL or whatever) or web
> agent, and get them into Bio* objects.

I can look into this as a limited solution.

> This is what Bio::DB::GenBank does.  It inherits from
> Bio::DB::NCBIHelper and Bio::DB::WebDBSeqI.  WebDBSeqI implements
> methods from RandomAccessI and adds a web agent; NCBIHelper inherits
> from WebDBSeqI and adds NCBI-specific parameters for remote access of
> the Entrez protein and nucleotide databases.

These have relatively clean, well-defined APIs; UCSC does not.  If you have
access to the UCSC source code, just take a look at joiner.doc to see the
mess.  Accessing NCBI is quite a different matter than accessing UCSC, I
think.  

> If you have the critical backend class made (remote or local access
> to the database), an interface could be designed similar to
> Bio::DB::GenBank.

That critical backend is not straightforward, as noted above, but I'll think
about it more.  

Unlike Genbank where each "object" is the same, there is no such single
entity at UCSC, so returning data from UCSC is potentially much more
complicated, with special cases for refSeq, knownGene, ESTs, mRNAs, BACS,
SNPs, cpg islands, etc.  All I'm saying is that the design of UCSC places
some constraints on at least the implementation of the interface, if not
also on the design of the API.

Sean




More information about the Bioperl-l mailing list