[Biojava-dev] Fwd: [blast-announce] New Version of BLAST XML output
Jose Manuel Duarte
jose.duarte at psi.ch
Mon May 11 09:19:03 UTC 2015
Just one more comment regarding alternatives to blast. Recently I've
come across such an alternative that is not as sensitive as blast but a
lot faster, it's called lambda:
http://www.seqan.de/projects/lambda/
I've tried it out and I'm very impressed with the results, it can do
full UniRef100 searches in a split of a second. There are still some
issues to iron out, especially in the indexing which is very memory and
disk hungry. But all in all it does seem to be a real alternative to blast.
Their output is blast compatible: they can do either classic pairwise
output (-m 0) or tabular output (-m 8). No XML output yet though.
So this would support the case to have some kind of framework that can
deal with the results of a sequence homology search. The actual parsers
would be then implemented on a per-case basis.
Jose
On 10.05.2015 14:04, Paolo Pavan wrote:
> Hello!
> I obviously share the opinion of Peter and Jose. Moreover, as already
> written, I have used this new feature in a second work that I could
> also describe and submit to biojava, if of any interest.
>
> About Andreas' questions:
> " Does your module support psiblast, rpsblast, tblastx and blast+ and
> what versions?": At now, it supports the blastn, blastp, blastx,
> tblastn and tblastx version 2.2.29. I'm not very sure about psiblast
> and rpsblast, I should test it.
> But it has been designed so that to update a single parser (as well to
> add a new search program and still remaining in the designed
> framework) there will be the need to write just a single class. This
> will keep the code simple and neat, very important in my opinion for
> future developers.
>
> "the disadvantage is that you constantly need to update them to the
> variant of blast plus version of the output file format": this
> unfortunately is a problem that everyone of us have to face if wants
> to use new ncbi programs. It happened for legacy-blast, it happened a
> lot of time for genbank format, it is happening for blast+. Just
> hoping that they would have the kindness explicit the format version
> inside the xml if not to name the program itself in different way,
> such for example blast3 or blast++, to avoid confusion. We can't do
> anything about that, we can just try to make the things simple and
> easy to reuse.
>
> Just to express my opinion, I think that every bio project should
> first of all address theese "base level" problem more than others to
> allow the developer to focus on higher abstraction details. I'm sure
> that this will be appreciated by the community, increasing the base of
> users of biojava.
>
> Paolo
>
> 2015-05-06 12:15 GMT+02:00 Jose Manuel Duarte <jose.duarte at psi.ch
> <mailto:jose.duarte at psi.ch>>:
>
> I'd say that having some common data structure to model the output
> of a sequence homology search should be benefitial. For instance a
> blast alternative might appear one day (I'm eagerly awaiting for
> it!). The common data structure should be able to model the
> outputs of any of the different softwares.
>
> There are already some alternatives to blast:
>
> SANS and SANSparallel by Liisa Holm
> (http://www.ncbi.nlm.nih.gov/pubmed/22962464,
> http://nar.oxfordjournals.org/content/early/2015/04/08/nar.gkv317.full)
> USEARCH (commercial) (http://drive5.com/usearch/)
> BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html#blat3)
>
> In fact SANSparallel looks very promising, it's incredibly fast
> though less sensitive than blast.
>
> Cheers
>
> Jose
>
>
>
>
> On 06.05.2015 10 <tel:06.05.2015%2010>:47, Peter Cock wrote:
>
> On Wed, May 6, 2015 at 6:02 AM, Andreas Prlic
> <andreas at sdsc.edu <mailto:andreas at sdsc.edu>> wrote:
>
> On Tue, May 5, 2015 at 1:18 PM, Paolo Pavan
> <paolo.pavan at gmail.com <mailto:paolo.pavan at gmail.com>> wrote:
>
> As seen in other Bio projects, aside with Sequence IO
> and Alignment IO
> procedures it could have a Search result IO also.
>
> I never understood why other Bio* projects have special
> Blast modules.
> Perhaps XML parsing is not as easy as it is in Java?
> Please see the code at
> the bottom of this message.
>
> Python at least has a range of XML parsing libraries which are
> up to the
> task. However, as Paolo wrote:
>
> The advantage is to define common data structures that
> models Hsp, Hits,
> Results without taking care (ie. making abstraction)
> of the underlying
> search program.
>
> This is the big advantage of BioPerl and Biopython's SearchIO
> module.
> You can at least in theory switch between parsing BLAST XML, BLAST
> tabular, BLAST plain text (shudder), or another related format
> without
> major changes to your code.
>
> and the disadvantage is that you constantly need to update
> them to the
> variant of blast plus version of the output file format.
>
> I think it is much better to have this housekeeping done once
> centrally in
> a Bio* library that re-invented by anyone parsing the BLAST
> output.
> However, the NCBI BLAST XML output has been fairly stable, and the
> new output has a formal schema so should be even more dependable.
>
> Peter
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at mailman.open-bio.org
> <mailto:biojava-dev at mailman.open-bio.org>
> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at mailman.open-bio.org
> <mailto:biojava-dev at mailman.open-bio.org>
> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150511/42f749ff/attachment-0001.html>
More information about the biojava-dev
mailing list