[Biojava-dev] Fwd: [blast-announce] New Version of BLAST XML output

Wed May 6 10:15:24 UTC 2015

I'd say that having some common data structure to model the output of a 
sequence homology search should be benefitial. For instance a blast 
alternative might appear one day (I'm eagerly awaiting for it!). The 
common data structure should be able to model the outputs of any of the 
different softwares.

There are already some alternatives to blast:

SANS and SANSparallel by Liisa Holm 
(http://www.ncbi.nlm.nih.gov/pubmed/22962464, 
http://nar.oxfordjournals.org/content/early/2015/04/08/nar.gkv317.full)
USEARCH (commercial) (http://drive5.com/usearch/)
BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html#blat3)

In fact SANSparallel looks very promising, it's incredibly fast though 
less sensitive than blast.

Cheers

Jose

On 06.05.2015 10:47, Peter Cock wrote:
> On Wed, May 6, 2015 at 6:02 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>> On Tue, May 5, 2015 at 1:18 PM, Paolo Pavan <paolo.pavan at gmail.com> wrote:
>>> As seen in other Bio projects, aside with Sequence IO and Alignment IO
>>> procedures it could have a Search result IO also.
>> I never understood why other Bio* projects have special Blast modules.
>> Perhaps XML parsing is not as easy as it is in Java? Please see the code at
>> the bottom of this message.
> Python at least has a range of XML parsing libraries which are up to the
> task. However, as Paolo wrote:
>
>>> The advantage is to define common data structures that models Hsp, Hits,
>>> Results without taking care (ie. making abstraction) of the underlying
>>> search program.
> This is the big advantage of BioPerl and Biopython's SearchIO module.
> You can at least in theory switch between parsing BLAST XML, BLAST
> tabular, BLAST plain text (shudder), or another related format without
> major changes to your code.
>
>> and the disadvantage is that you constantly need to update them to the
>> variant of blast plus version of the output file format.
> I think it is much better to have this housekeeping done once centrally in
> a Bio* library that re-invented by anyone parsing the BLAST output.
> However, the NCBI BLAST XML output has been fairly stable, and the
> new output has a formal schema so should be even more dependable.
>
> Peter
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biojava-dev