[Biojava-dev] Fwd: [blast-announce] New Version of BLAST XML output

Paolo Pavan paolo.pavan at gmail.com
Thu May 14 08:00:17 UTC 2015


Ok, I will put it on line as soon as possible.

Bye bye,
Paolo

2015-05-13 1:21 GMT+02:00 Andreas Prlic <andreas at sdsc.edu>:

> Hi Paolo,
>
> I definitely don't want to curb your enthusiasm. What about packaging this
> up as a new module in your fork? Then we can review the code base and take
> it from there. Some criteria to review this on are A) Ease of maintenance
> B) extensibility for BLAST variants C) general applicability for any
> database searches (potential to hook up BLAST alternatives)
>
> Andreas
>
>
>
> On Sun, May 10, 2015 at 5:04 AM, Paolo Pavan <paolo.pavan at gmail.com>
> wrote:
>
>> Hello!
>> I obviously share the opinion of Peter and Jose. Moreover, as already
>> written, I have used this new feature in a second work that I could also
>> describe and submit to biojava, if of any interest.
>>
>> About Andreas' questions:
>> " Does your module support psiblast, rpsblast, tblastx and blast+ and
>> what versions?": At now, it supports the blastn, blastp, blastx, tblastn
>> and tblastx version 2.2.29. I'm not very sure about psiblast and rpsblast,
>> I should test it.
>> But it has been designed so that to update a single parser (as well to
>> add a new search program and still remaining in the designed framework)
>> there will be the need to write just a single class. This will keep the
>> code simple and neat, very important in my opinion for future developers.
>>
>> "the disadvantage is that you constantly need to update them to the
>> variant of blast plus version of the output file format": this
>> unfortunately is a problem that everyone of us have to face if wants to use
>> new ncbi programs. It happened for legacy-blast, it happened a lot of time
>> for genbank format, it is happening for blast+. Just hoping that they would
>> have the kindness explicit the format version inside the xml if not to name
>> the program itself in different way, such for example blast3 or blast++, to
>> avoid confusion. We can't do anything about that, we can just try to make
>> the things simple and easy to reuse.
>>
>> Just to express my opinion, I think that every bio project should first
>> of all address theese "base level" problem more than others to allow the
>> developer to focus on higher abstraction details. I'm sure that this will
>> be appreciated by the community, increasing the base of users of biojava.
>>
>> Paolo
>>
>>
>> 2015-05-06 12:15 GMT+02:00 Jose Manuel Duarte <jose.duarte at psi.ch>:
>>
>>> I'd say that having some common data structure to model the output of a
>>> sequence homology search should be benefitial. For instance a blast
>>> alternative might appear one day (I'm eagerly awaiting for it!). The common
>>> data structure should be able to model the outputs of any of the different
>>> softwares.
>>>
>>> There are already some alternatives to blast:
>>>
>>> SANS and SANSparallel by Liisa Holm (
>>> http://www.ncbi.nlm.nih.gov/pubmed/22962464,
>>> http://nar.oxfordjournals.org/content/early/2015/04/08/nar.gkv317.full)
>>> USEARCH (commercial) (http://drive5.com/usearch/)
>>> BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html#blat3)
>>>
>>> In fact SANSparallel looks very promising, it's incredibly fast though
>>> less sensitive than blast.
>>>
>>> Cheers
>>>
>>> Jose
>>>
>>>
>>>
>>>
>>> On 06.05.2015 10:47, Peter Cock wrote:
>>>
>>>> On Wed, May 6, 2015 at 6:02 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>>>
>>>>> On Tue, May 5, 2015 at 1:18 PM, Paolo Pavan <paolo.pavan at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> As seen in other Bio projects, aside with Sequence IO and Alignment IO
>>>>>> procedures it could have a Search result IO also.
>>>>>>
>>>>> I never understood why other Bio* projects have special Blast modules.
>>>>> Perhaps XML parsing is not as easy as it is in Java? Please see the
>>>>> code at
>>>>> the bottom of this message.
>>>>>
>>>> Python at least has a range of XML parsing libraries which are up to the
>>>> task. However, as Paolo wrote:
>>>>
>>>>  The advantage is to define common data structures that models Hsp,
>>>>>> Hits,
>>>>>> Results without taking care (ie. making abstraction) of the underlying
>>>>>> search program.
>>>>>>
>>>>> This is the big advantage of BioPerl and Biopython's SearchIO module.
>>>> You can at least in theory switch between parsing BLAST XML, BLAST
>>>> tabular, BLAST plain text (shudder), or another related format without
>>>> major changes to your code.
>>>>
>>>>  and the disadvantage is that you constantly need to update them to the
>>>>> variant of blast plus version of the output file format.
>>>>>
>>>> I think it is much better to have this housekeeping done once centrally
>>>> in
>>>> a Bio* library that re-invented by anyone parsing the BLAST output.
>>>> However, the NCBI BLAST XML output has been fairly stable, and the
>>>> new output has a formal schema so should be even more dependable.
>>>>
>>>> Peter
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at mailman.open-bio.org
>>>> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>
>
> --
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> RCSB PDB Protein Data Bank
> Technical & Scientific Team Lead
> University of California, San Diego
>
> Editor Software Section
> PLOS Computational Biology
>
> BioJava Project Lead
> -----------------------------------------------------------------------
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150514/723f88fe/attachment.html>


More information about the biojava-dev mailing list