[Biojava-dev] Fwd: [blast-announce] New Version of BLAST XML output
Erik McKee
emckee2006 at gmail.com
Mon May 11 09:30:06 UTC 2015
How does gmap compare to these?
On May 11, 2015 5:26 AM, "Jose Manuel Duarte" <jose.duarte at psi.ch> wrote:
> Just one more comment regarding alternatives to blast. Recently I've come
> across such an alternative that is not as sensitive as blast but a lot
> faster, it's called lambda:
>
> http://www.seqan.de/projects/lambda/
>
> I've tried it out and I'm very impressed with the results, it can do full
> UniRef100 searches in a split of a second. There are still some issues to
> iron out, especially in the indexing which is very memory and disk hungry.
> But all in all it does seem to be a real alternative to blast.
>
> Their output is blast compatible: they can do either classic pairwise
> output (-m 0) or tabular output (-m 8). No XML output yet though.
>
> So this would support the case to have some kind of framework that can
> deal with the results of a sequence homology search. The actual parsers
> would be then implemented on a per-case basis.
>
> Jose
>
>
>
> On 10.05.2015 14:04, Paolo Pavan wrote:
>
> Hello!
> I obviously share the opinion of Peter and Jose. Moreover, as already
> written, I have used this new feature in a second work that I could also
> describe and submit to biojava, if of any interest.
>
> About Andreas' questions:
> " Does your module support psiblast, rpsblast, tblastx and blast+ and
> what versions?": At now, it supports the blastn, blastp, blastx, tblastn
> and tblastx version 2.2.29. I'm not very sure about psiblast and rpsblast,
> I should test it.
> But it has been designed so that to update a single parser (as well to add
> a new search program and still remaining in the designed framework) there
> will be the need to write just a single class. This will keep the code
> simple and neat, very important in my opinion for future developers.
>
> "the disadvantage is that you constantly need to update them to the
> variant of blast plus version of the output file format": this
> unfortunately is a problem that everyone of us have to face if wants to use
> new ncbi programs. It happened for legacy-blast, it happened a lot of time
> for genbank format, it is happening for blast+. Just hoping that they would
> have the kindness explicit the format version inside the xml if not to name
> the program itself in different way, such for example blast3 or blast++, to
> avoid confusion. We can't do anything about that, we can just try to make
> the things simple and easy to reuse.
>
> Just to express my opinion, I think that every bio project should first
> of all address theese "base level" problem more than others to allow the
> developer to focus on higher abstraction details. I'm sure that this will
> be appreciated by the community, increasing the base of users of biojava.
>
> Paolo
>
> 2015-05-06 12:15 GMT+02:00 Jose Manuel Duarte <jose.duarte at psi.ch>:
>
>> I'd say that having some common data structure to model the output of a
>> sequence homology search should be benefitial. For instance a blast
>> alternative might appear one day (I'm eagerly awaiting for it!). The common
>> data structure should be able to model the outputs of any of the different
>> softwares.
>>
>> There are already some alternatives to blast:
>>
>> SANS and SANSparallel by Liisa Holm (
>> http://www.ncbi.nlm.nih.gov/pubmed/22962464,
>> http://nar.oxfordjournals.org/content/early/2015/04/08/nar.gkv317.full)
>> USEARCH (commercial) (http://drive5.com/usearch/)
>> BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html#blat3)
>>
>> In fact SANSparallel looks very promising, it's incredibly fast though
>> less sensitive than blast.
>>
>> Cheers
>>
>> Jose
>>
>>
>>
>>
>> On 06.05.2015 10:47, Peter Cock wrote:
>>
>>> On Wed, May 6, 2015 at 6:02 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>>
>>>> On Tue, May 5, 2015 at 1:18 PM, Paolo Pavan <paolo.pavan at gmail.com>
>>>> wrote:
>>>>
>>>>> As seen in other Bio projects, aside with Sequence IO and Alignment IO
>>>>> procedures it could have a Search result IO also.
>>>>>
>>>> I never understood why other Bio* projects have special Blast modules.
>>>> Perhaps XML parsing is not as easy as it is in Java? Please see the
>>>> code at
>>>> the bottom of this message.
>>>>
>>> Python at least has a range of XML parsing libraries which are up to the
>>> task. However, as Paolo wrote:
>>>
>>> The advantage is to define common data structures that models Hsp,
>>>>> Hits,
>>>>> Results without taking care (ie. making abstraction) of the underlying
>>>>> search program.
>>>>>
>>>> This is the big advantage of BioPerl and Biopython's SearchIO module.
>>> You can at least in theory switch between parsing BLAST XML, BLAST
>>> tabular, BLAST plain text (shudder), or another related format without
>>> major changes to your code.
>>>
>>> and the disadvantage is that you constantly need to update them to the
>>>> variant of blast plus version of the output file format.
>>>>
>>> I think it is much better to have this housekeeping done once centrally
>>> in
>>> a Bio* library that re-invented by anyone parsing the BLAST output.
>>> However, the NCBI BLAST XML output has been fairly stable, and the
>>> new output has a formal schema so should be even more dependable.
>>>
>>> Peter
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150511/bcbf0eb0/attachment.html>
More information about the biojava-dev
mailing list