[Bioperl-l] final proposal: Bio::DB::WebSeqDBI

Ewan Birney birney@ebi.ac.uk
Tue, 12 Dec 2000 10:21:16 +0000 (GMT)


On Mon, 11 Dec 2000, Jason Stajich wrote:

> The final proposal before I commit the code (all tests pass on my
> machine).
> 
> 2 new modules
> Bio::DB::WebSeqDBI - ISA Bio::DB::RandomAccessI 
> Bio::DB::NCBIHelper ISA Bio::DB::WebSeqDBI
> 
> rewrites of Bio::DB::GenBank, Bio::DB::GenPept, Bio::DB::SwissProt.
> 
> Bio::DB::WebSeqI - 
> 
> This interface encapsulates the standard data retrieval methods from a
> Web Sequence Database.  Implementing classes must implement the method
> get_request while takes as arguments a hash
> of qualifiers - uids, format, etc with which to query the database and
> returns a HTTP::Request object.  The WebSeqDBI class manages a
> LWP::UserAgent for obtaining data from the web dbs and turning that data
> stream into a Bio::SeqIO.  
> 
> Because of the way LWP works right now, it is not possible to take a data
> stream from webserver and transform it into a Bio::SeqIO, rather, one must
> read all the data from the server and then either store that in a tempfile
> or transform it into a IO::String which can be treated as a filehandle.
> Also a pain, the retrieval method from NCBI has some HTML 'contamination'
> which needs to be screened out through a method call to postprocess_data.
> 
> One issue I am not sure how to best deal with, the temporary file removal
> at the end of the life of the Bio::DB::WebSeqDBI object.  The following
> code illustrates a case this will remove files too soon.
> 
> my $seqdb = new Bio::DB::Genbank(-retrievaltype=>'tempfile');
> my $seqio = $seqdb->get_Stream_by_id($accession);
> undef $seqdb;  # this will remove the seqdb object and cleanup the
>                # tempfile that was created
> my $seq = $seqio->next_seq(); # bomb because no file exists now.
> 
> Anyone with better ideas on this feel free to let me know.
> 
> Bio::DB::NCBIHelper -
> 
> Since the Bio::DB::GenBank and Bio::DB::GenPept are so similar I wrote a
> class that encapsulates all the of common functionality for retrieving
> sequence data from these databases.
> 
> I'm sure it will all make much more sense once I check the code in, I just
> wanted to check and see if anyone has comments or wants clarification
> before I checkin major reworks to the current modules.
> 
> Is the name WebSeqDBI misleading - (ie looks like it would be a DBI
> module...?) We like to use 'I' at the end of a module name to denote
> interfaces.

I know where you are coming from, but I do think we have to do something
different here in the naming. WebDBSeqI ?


> 
> -Jason
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center 
> http://www.chg.duke.edu/ 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------