[Biojava-l] org.biojava.bio.search interfaces

Keith James kdj@sanger.ac.uk
08 Jan 2001 13:45:35 +0000


Hi,

I've made some progress on Fasta search output parsing to the extent
that I'm happy with the initial design and implementation and I'm
about to write a demo application to debug it.

In order to comply with the existing interfaces the parser must be
able to resolve the query sequence id (in the search output file) to a
SymbolList instance and the database filename (also in the search
output file) to a SequenceDB instance. (See the getQuerySequence(),
getSequenceDB() and getSearcher() methods in
SeqSimilaritySearchResult). At the moment I've achieved this by using
a SequenceDB to hold the query sequences while the parsing takes place
and a SequenceDBInstallation to resolve the database.

In doing this I've had some difficulties with the interfaces, most of
which were caused by my inexperience with Java, but some which appear
to be issues with the interface design. Specifically, the
SeqSimilaritySearchResult, Hit and SubHit implementations are required
to contain circular references (Result->List->Hit and Hit->Result,
Hit->List->SubHit and SubHit->Hit).

>From my experience with using other search parsers it's quite rare
that a one needs to refer from a SubHit back up the hierarchy. I think
that the interfaces would be better off without this requirement. (The
SimpleSeqSimilaritySearch classes can't be instantiated because of
this; they create immutable objects by passing all properties to the
constructor. There is a chicken-egg situation where each needs the
other to be instantiated first.)

I would be interested in what folks think of removing
getSearchResult() from Interface SeqSimilaritySearchHit and getHit()
from Interface SeqSimilaritySearchSubHit? (Or in fact, hearing of neat
tricks which are made possible by retaining those methods.)

By the way, thanks to the authors of the tutorials.

cheers,

-- 

-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA