[Biojava-l] accessing blast and genbank
David Huen
david.huen at ntlworld.com
Mon Feb 10 10:48:53 EST 2003
On Monday 10 Feb 2003 9:32 am, jawue001 at uni-duesseldorf.de wrote:
> Hi. I am wondering if there exists an implementation to access databases
> (like blast) from within Java (without the webinterface). I have seen it
> as Perl modules, but I rather need it for Java.
If you mean here accessing FORMATDB databases, then no, there isn't a class
to do it right now. I do have a SequenceDB class in the making that will
(when finished) do the DNA ones formatted by NCBI FORMATDB using data
obtained by partly reverse-engineering the format. But such classes are
inherently dangerous in that the format is not really open and subject to
change - this will have the consequence of a higher maintenance load on me.
I'm not sure whether this class is really a good thing.
>
> I have read the mailing list and found the link to the "Blast Java
> Library" by Patrick McConnell. Can anyone tell me if this is the right
> way to go? As I understand, an external program is called (the original
> blast software?). Is there a reason why whatever the blast program does
> cannot be implemented in Java?
>
There is no reason why Blast cannot be implemented in Java if you really
must have it. But BLASTN, which I have looked into in detail, is a
finely-crafted piece of code in C which deploys comparison and pointer
tricks to get speed and it is highly unlikely you could be as fast as such
a piece of code in Java. Similarly, for efficiency, you couldn't have it
run over the current SymbolLists but will need to use some packed
implementation to increase the capacity for comparing multiple symbols with
a single test. In effect, to get anything remotely near the C
performance, you'll need to write C-style Java and treat sequences as
bit-patterns, etc.
So it could be done but would anyone really want such an animal considering
that two perfectly fine and constantly maintained implementations exist
already in C? Especially considering it will be slower and also be yet
another tool for us to maintain? Perhaps someone might pick up this task
if sufficiently convincing arguments were raised why we might want this.
Regards,
David
More information about the Biojava-l
mailing list