[Biojava-l] accessing blast and genbank
Thomas Down
td2 at sanger.ac.uk
Mon Feb 10 11:34:54 EST 2003
On Mon, Feb 10, 2003 at 10:48:53AM +0000, David Huen wrote:
>
> > I have read the mailing list and found the link to the "Blast Java
> > Library" by Patrick McConnell. Can anyone tell me if this is the right
> > way to go? As I understand, an external program is called (the original
> > blast software?). Is there a reason why whatever the blast program does
> > cannot be implemented in Java?
> >
> There is no reason why Blast cannot be implemented in Java if you really
> must have it. But BLASTN, which I have looked into in detail, is a
> finely-crafted piece of code in C which deploys comparison and pointer
> tricks to get speed and it is highly unlikely you could be as fast as such
> a piece of code in Java. Similarly, for efficiency, you couldn't have it
> run over the current SymbolLists but will need to use some packed
> implementation to increase the capacity for comparing multiple symbols with
> a single test. In effect, to get anything remotely near the C
> performance, you'll need to write C-style Java and treat sequences as
> bit-patterns, etc.
>
> So it could be done but would anyone really want such an animal considering
> that two perfectly fine and constantly maintained implementations exist
> already in C? Especially considering it will be slower and also be yet
> another tool for us to maintain? Perhaps someone might pick up this task
> if sufficiently convincing arguments were raised why we might want this.
Yes, I have to agree with this -- I'd have to see pretty
compelling reasons to write yet another implementation of that
basic algorithm. It's not too hard to Runtime.exec blast
processes from Java.
What I would add is that BioJava includes code for both exact
dynamic programming, and fast word-matching algorithms. Based
on these two, it's possible to build quite a wide range of
`vaguely blast-like' search methods. I certainly wouldn't
recommend building a general purpose blast clone, but if you
need something a bit different, and are more worried about
getting it up and running quickly than extracting the last
few % of performance, BioJava with the DP and SSAHA packages
might be a good choice.
Thomas.
More information about the Biojava-l
mailing list