[Biopython-dev] Blast XML Parse Improvements

Peter Cock p.j.a.cock at googlemail.com
Wed Jul 29 18:26:18 UTC 2015


Hi Travis,

This is an NCBI "feature", not a Biopython bug.

Using -subject means BLAST does lots of pairwise comparisons,
so you get lots of iteration blocks in the XML. To get the results
you probably want (and meaningful e-values), first make a BLAST
database. See:

http://blastedbio.blogspot.co.uk/2012/05/blast-ingoring-search-space-size-for-e.html

Regards,

Peter

On Wed, Jul 29, 2015 at 5:19 PM, Travis Wrightsman <twrig002 at ucr.edu> wrote:
> Peter and others,
>
> Here is the gist: https://gist.github.com/twrightsman/a94c66016692fd4295be
>
> The xml output shows the tool version and I included my shell commands used
> to get the results; it is still using the "current" BLAST XML format with
> iteration blocks.
>
> -Travis
>
> On Wed, Jul 29, 2015 at 1:44 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> Right, the new "blastxml2" output in BLAST+ 2.2.31 is a whole new format,
>> but the existing "blastxml" which Biopython already has a parser for is
>> still available as usual via -outfmt 5.
>>
>> And that parser ought to work fine with multiple queries.
>>
>> Travis, can you post more details? e.g. BLAST+ command line used
>> and ideally sample output (on gist.github.com or similar)
>>
>> Thanks,
>>
>> Peter
>>
>> On Wed, Jul 29, 2015 at 6:29 AM, Fields, Christopher J
>> <cjfields at illinois.edu> wrote:
>> > Keep in mind there are significant changes for BLAST+ XML output in the
>> > latest release, not all of them good IMHO.  Peter has a pretty good
>> > writeup
>> > on this:
>> >
>> > http://blastedbio.blogspot.com/2015/07/blast-xml-2-include-trouble.html
>> >
>> > chris
>> >
>> > On Jul 29, 2015, at 12:21 AM, Wibowo Arindrarto <w.arindrarto at gmail.com>
>> > wrote:
>> >
>> > Hi Travis,
>> >
>> > There hasn't been any new test cases added for BLAST 2.2.30, as I
>> > recall. Did you get the same behavior when using Bio.SearchIO to parse
>> > it?
>> >
>> > Best regards,
>> > Bow
>> >
>> > On Wed, Jul 29, 2015 at 7:12 AM, Travis Wrightsman <twrig002 at ucr.edu>
>> > wrote:
>> >
>> > Biopython Devs,
>> >
>> > I was trying to parse through a tblastn XML output with multiple queries
>> > in
>> > the query file and it was generating a BLAST record object for each
>> > contig
>> > in the subject file instead of each run. Is this the intended behavior
>> > or
>> > has this not been updated for ncbi tools 2.2.30?
>> >
>> > If it needs to be updated or finished, I'm more than happy to rewrite or
>> > update some of the code in NCBIXML for the latest version.
>> >
>> > -Travis
>> >
>> > _______________________________________________
>> > Biopython-dev mailing list
>> > Biopython-dev at mailman.open-bio.org
>> > http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>> >
>> > _______________________________________________
>> > Biopython-dev mailing list
>> > Biopython-dev at mailman.open-bio.org
>> > http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>> >
>> >
>> >
>> > _______________________________________________
>> > Biopython-dev mailing list
>> > Biopython-dev at mailman.open-bio.org
>> > http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>
>


More information about the Biopython-dev mailing list