[Biojava-l] sequence dbs

Jason Stajich jason@chg.mc.duke.edu
Tue, 15 May 2001 17:18:54 -0400 (EDT)


I started to work on this at biojava bootcamp, didn't get very far because
of the following:
seq.db.SequenceDB currently have the following methods that one cannot
implement for 'remote' databases.  

<   Set ids();
<   SequenceIterator sequenceIterator();

<   void addSequence(Sequence seq)
<   throws IllegalIDException, BioException, ChangeVetoException;
<   void removeSequence(String id)
<   throws IllegalIDException, BioException, ChangeVetoException;

I started to split these methods into separate interfaces -
LocalSequenceDB for the ids() and seuenceIterator and UpdateableSequenceDB
for add/remove.  This of course breaks all classes which depend on
SequenceDB.  The other option is to create RemoteSequenceDB which throws
VetoExceptions for add/remove calls and some other exception for
ids()/sequenceIterator().  

BTW: An example of a RemoteDB is web EMBL queries which we will patch
through HTTP to extract a sequence from this database (will be talking to
Heikki's web script).  Similarly if the GenBank parsing works we can pass
queries to NCBI GenBank to query on an accession number.

One other major issue is: what if we do not know what type of sequence we
are obtaining (prot or [dr]na)?  Biojava likes to have these things
established in the parser - but I won't really be able to divine anything
from an accession number.  ideas?

-jason

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/