[Biojava-dev] Fwd: [blast-announce] New Version of BLAST XML output

Jose Manuel Duarte jose.duarte at psi.ch
Mon May 11 09:19:03 UTC 2015


Just one more comment regarding alternatives to blast. Recently I've 
come across such an alternative that is not as sensitive as blast but a 
lot faster, it's called lambda:

http://www.seqan.de/projects/lambda/

I've tried it out and I'm very impressed with the results, it can do 
full UniRef100 searches in a split of a second. There are still some 
issues to iron out, especially in the indexing which is very memory and 
disk hungry. But all in all it does seem to be a real alternative to blast.

Their output is blast compatible: they can do either classic pairwise 
output (-m 0) or tabular output (-m 8). No XML output yet though.

So this would support the case to have some kind of framework that can 
deal with the results of a sequence homology search. The actual parsers 
would be then implemented on a per-case basis.

Jose



On 10.05.2015 14:04, Paolo Pavan wrote:
> Hello!
> I obviously share the opinion of Peter and Jose. Moreover, as already 
> written, I have used this new feature in a second work that I could 
> also describe and submit to biojava, if of any interest.
>
> About Andreas' questions:
> " Does your module support psiblast, rpsblast, tblastx and blast+ and 
> what versions?": At now, it supports the blastn, blastp, blastx, 
> tblastn and tblastx version 2.2.29. I'm not very sure about psiblast 
> and rpsblast, I should test it.
> But it has been designed so that to update a single parser (as well to 
> add a new search program and still remaining in the designed 
> framework) there will be the need to write just a single class. This 
> will keep the code simple and neat, very important in my opinion for 
> future developers.
>
> "the disadvantage is that you constantly need to update them to the 
> variant of blast plus version of the output file format": this 
> unfortunately is a problem that everyone of us have to face if wants 
> to use new ncbi programs. It happened for legacy-blast, it happened a 
> lot of time for genbank format, it is happening for blast+. Just 
> hoping that they would have the kindness explicit the format version 
> inside the xml if not to name the program itself in different way, 
> such for example blast3 or blast++, to avoid confusion. We can't do 
> anything about that, we can just try to make the things simple and 
> easy to reuse.
>
> Just to express my opinion, I think that every bio project should 
> first of all address theese "base level" problem more than others to 
> allow the developer to focus on higher abstraction details. I'm sure 
> that this will be appreciated by the community, increasing the base of 
> users of biojava.
>
> Paolo
>
> 2015-05-06 12:15 GMT+02:00 Jose Manuel Duarte <jose.duarte at psi.ch 
> <mailto:jose.duarte at psi.ch>>:
>
>     I'd say that having some common data structure to model the output
>     of a sequence homology search should be benefitial. For instance a
>     blast alternative might appear one day (I'm eagerly awaiting for
>     it!). The common data structure should be able to model the
>     outputs of any of the different softwares.
>
>     There are already some alternatives to blast:
>
>     SANS and SANSparallel by Liisa Holm
>     (http://www.ncbi.nlm.nih.gov/pubmed/22962464,
>     http://nar.oxfordjournals.org/content/early/2015/04/08/nar.gkv317.full)
>     USEARCH (commercial) (http://drive5.com/usearch/)
>     BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html#blat3)
>
>     In fact SANSparallel looks very promising, it's incredibly fast
>     though less sensitive than blast.
>
>     Cheers
>
>     Jose
>
>
>
>
>     On 06.05.2015 10 <tel:06.05.2015%2010>:47, Peter Cock wrote:
>
>         On Wed, May 6, 2015 at 6:02 AM, Andreas Prlic
>         <andreas at sdsc.edu <mailto:andreas at sdsc.edu>> wrote:
>
>             On Tue, May 5, 2015 at 1:18 PM, Paolo Pavan
>             <paolo.pavan at gmail.com <mailto:paolo.pavan at gmail.com>> wrote:
>
>                 As seen in other Bio projects, aside with Sequence IO
>                 and Alignment IO
>                 procedures it could have a Search result IO also.
>
>             I never understood why other Bio* projects have special
>             Blast modules.
>             Perhaps XML parsing is not as easy as it is in Java?
>             Please see the code at
>             the bottom of this message.
>
>         Python at least has a range of XML parsing libraries which are
>         up to the
>         task. However, as Paolo wrote:
>
>                 The advantage is to define common data structures that
>                 models Hsp, Hits,
>                 Results without taking care (ie. making abstraction)
>                 of the underlying
>                 search program.
>
>         This is the big advantage of BioPerl and Biopython's SearchIO
>         module.
>         You can at least in theory switch between parsing BLAST XML, BLAST
>         tabular, BLAST plain text (shudder), or another related format
>         without
>         major changes to your code.
>
>             and the disadvantage is that you constantly need to update
>             them to the
>             variant of blast plus version of the output file format.
>
>         I think it is much better to have this housekeeping done once
>         centrally in
>         a Bio* library that re-invented by anyone parsing the BLAST
>         output.
>         However, the NCBI BLAST XML output has been fairly stable, and the
>         new output has a formal schema so should be even more dependable.
>
>         Peter
>         _______________________________________________
>         biojava-dev mailing list
>         biojava-dev at mailman.open-bio.org
>         <mailto:biojava-dev at mailman.open-bio.org>
>         http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>
>
>     _______________________________________________
>     biojava-dev mailing list
>     biojava-dev at mailman.open-bio.org
>     <mailto:biojava-dev at mailman.open-bio.org>
>     http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150511/42f749ff/attachment-0001.html>


More information about the biojava-dev mailing list