[Biojava-l] accessing blast and genbank

David Huen david.huen at ntlworld.com
Mon Feb 10 10:48:53 EST 2003


On Monday 10 Feb 2003 9:32 am, jawue001 at uni-duesseldorf.de wrote:
> Hi. I am wondering if there exists an implementation to access databases
> (like blast) from within Java (without the webinterface). I have seen it
> as Perl modules, but I rather need it for Java.

If you mean here accessing FORMATDB databases, then no, there isn't a class 
to do it right now.  I do have a SequenceDB class in the making that will 
(when finished) do the DNA ones formatted by NCBI FORMATDB using data 
obtained by partly reverse-engineering the format.  But such classes are 
inherently dangerous in that the format is not really open and subject to 
change - this will have the consequence of a higher maintenance load on me.  
I'm not sure whether this class is really a good thing.
>
> I have read the mailing list and found the link to the "Blast Java
> Library" by Patrick McConnell. Can anyone tell me if this is the right
> way to go? As I understand, an external program is called (the original
> blast software?). Is there a reason why whatever the blast program does
> cannot be implemented in Java?
>
There is no reason why Blast cannot be implemented in Java if you really 
must have it.  But BLASTN, which I have looked into in detail, is a 
finely-crafted piece of code in C which deploys comparison and pointer 
tricks to get speed and it is highly unlikely you could be as fast as such 
a piece of code in Java.  Similarly, for efficiency, you couldn't have it 
run over the current SymbolLists but will need to use some packed 
implementation to increase the capacity for comparing multiple symbols with 
a single test.  In effect, to get anything remotely  near the C 
performance, you'll need to write C-style Java and treat sequences as 
bit-patterns, etc.

So it could be done but would anyone really want such an animal considering 
that two perfectly fine and constantly maintained implementations exist 
already in C?  Especially considering it will be slower and also be yet 
another tool for us to maintain?  Perhaps someone might pick up this task 
if sufficiently convincing arguments were raised why we might want this.

Regards,
David




More information about the Biojava-l mailing list