[Bioperl-l] final proposal: Bio::DB::WebSeqDBI

Jason Stajich jason@chg.mc.duke.edu
Mon, 11 Dec 2000 18:05:52 -0500 (EST)


The final proposal before I commit the code (all tests pass on my
machine).

2 new modules
Bio::DB::WebSeqDBI - ISA Bio::DB::RandomAccessI 
Bio::DB::NCBIHelper ISA Bio::DB::WebSeqDBI

rewrites of Bio::DB::GenBank, Bio::DB::GenPept, Bio::DB::SwissProt.

Bio::DB::WebSeqI - 

This interface encapsulates the standard data retrieval methods from a
Web Sequence Database.  Implementing classes must implement the method
get_request while takes as arguments a hash
of qualifiers - uids, format, etc with which to query the database and
returns a HTTP::Request object.  The WebSeqDBI class manages a
LWP::UserAgent for obtaining data from the web dbs and turning that data
stream into a Bio::SeqIO.  

Because of the way LWP works right now, it is not possible to take a data
stream from webserver and transform it into a Bio::SeqIO, rather, one must
read all the data from the server and then either store that in a tempfile
or transform it into a IO::String which can be treated as a filehandle.
Also a pain, the retrieval method from NCBI has some HTML 'contamination'
which needs to be screened out through a method call to postprocess_data.

One issue I am not sure how to best deal with, the temporary file removal
at the end of the life of the Bio::DB::WebSeqDBI object.  The following
code illustrates a case this will remove files too soon.

my $seqdb = new Bio::DB::Genbank(-retrievaltype=>'tempfile');
my $seqio = $seqdb->get_Stream_by_id($accession);
undef $seqdb;  # this will remove the seqdb object and cleanup the
               # tempfile that was created
my $seq = $seqio->next_seq(); # bomb because no file exists now.

Anyone with better ideas on this feel free to let me know.

Bio::DB::NCBIHelper -

Since the Bio::DB::GenBank and Bio::DB::GenPept are so similar I wrote a
class that encapsulates all the of common functionality for retrieving
sequence data from these databases.

I'm sure it will all make much more sense once I check the code in, I just
wanted to check and see if anyone has comments or wants clarification
before I checkin major reworks to the current modules.

Is the name WebSeqDBI misleading - (ie looks like it would be a DBI
module...?) We like to use 'I' at the end of a module name to denote
interfaces.

-Jason
Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/