[Biojava-l] sequence dbs

Schreiber, Mark mark.schreiber@agresearch.co.nz
Wed, 16 May 2001 09:30:36 +1200


Hi -

I very much favour the idea of having a remote SequenceDB rather than
breaking the substantial amount of code that uses SequenceDB. I use
SequenceDB all the time in my programs so I guess I am keen to not have to
recode it all.

As for parsing an unknown sequence type a simple (and innefficient way to do
it would be to read the record once as a text (or XML) file to determine the
correct alphabet then parse it for real. Don't know if this can be done
dynamically with the current biojava parsers. Maybe parsers based on a SAX
event model would be the way to go??

Mark

Mark Schreiber
Bioinformatics
AgResearch Invermay
PO Box 50034
Mosgiel
New Zealand

PH: +64 3 489 9175

 

> -----Original Message-----
> From: Jason Stajich [mailto:jason@chg.mc.duke.edu]
> Sent: Wednesday, May 16, 2001 9:19 AM
> To: BioJava List
> Subject: [Biojava-l] sequence dbs
> 
> 
> I started to work on this at biojava bootcamp, didn't get 
> very far because
> of the following:
> seq.db.SequenceDB currently have the following methods that one cannot
> implement for 'remote' databases.  
> 
> <   Set ids();
> <   SequenceIterator sequenceIterator();
> 
> <   void addSequence(Sequence seq)
> <   throws IllegalIDException, BioException, ChangeVetoException;
> <   void removeSequence(String id)
> <   throws IllegalIDException, BioException, ChangeVetoException;
> 
> I started to split these methods into separate interfaces -
> LocalSequenceDB for the ids() and seuenceIterator and 
> UpdateableSequenceDB
> for add/remove.  This of course breaks all classes which depend on
> SequenceDB.  The other option is to create RemoteSequenceDB 
> which throws
> VetoExceptions for add/remove calls and some other exception for
> ids()/sequenceIterator().  
> 
> BTW: An example of a RemoteDB is web EMBL queries which we will patch
> through HTTP to extract a sequence from this database (will 
> be talking to
> Heikki's web script).  Similarly if the GenBank parsing works 
> we can pass
> queries to NCBI GenBank to query on an accession number.
> 
> One other major issue is: what if we do not know what type of 
> sequence we
> are obtaining (prot or [dr]na)?  Biojava likes to have these things
> established in the parser - but I won't really be able to 
> divine anything
> from an accession number.  ideas?
> 
> -jason
> 
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center 
> http://www.chg.duke.edu/ 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>