[Bioperl-l] standalone blast outformat

Andreas Leimbach andreas.leimbach at uni-wuerzburg.de
Wed Oct 30 10:18:15 UTC 2013


Hey Dimitark,

the custom format for blast-formats 6, 7, and 10 are actually quite new, 
two years or so (I reckon). I don't think anybody implemented them in 
BioPerl and its probably not appropriate as you can dynamically adapt 
your outfile according to the format specifiers. E.g. 'qcov' and the 
others can be in any column of the resulting tab-delimited output file. 
Blast result tiling is not so trivial if you want to do it manually.
There are two things you can do:
1.) Use the full blast output instead of the abbreviated tab-delimited 
outputs. Then you can calculate the query coverage with a BioPerl 
internal method (which calls 'tile_hsps' and tiles all the HSPs 
automatically):
$hit->frac_aligned_query();
2.) If your blast output files are too big, you can get the length of 
each query either from the tab-delimited file or from the query fasta 
and feed it into the objects hash. Morgan Langille described this here:
http://betascience.blogspot.de/2010/01/filtering-blast-hits-by-coverage-using.html

HTH,
Andreas

--
Andreas Leimbach
Universität Münster
Institut für Hygiene
Mendelstr. 7
D-48149 Münster
Germany

Tel.: +49 (0)551 39 33843
E-Mail: andreas.leimbach at uni-wuerzburg.de

On 30.10.13 10:32, dimitark at bii.a-star.edu.sg wrote:
> Hi Florent and Jason,
> Thank you for your replies.
>
> well i spent quite some time to figure that out. But still no results.
> Yes i use BLAST+. So I tried BLASTN as it is, without bioperl. Just to
> see if i can do: blastn -outfmt "7 qcovs other_things". Well it works
> like this. But in bioperl can not pass do that. I dont know why tho.
>
> After i tried to understand how tile_hsps() works. But no results.
> Somehow the documentation on that matter is not clear and not a simple
> example is present. So i could not figure out how to use it.
>
> Now i am trying to figure out how to use FASTA or SSEARCH to solve my
> problem. I suppose i have to get the sequences of the HSPs representing
> my hit and then run ssearch like this:
>
>     ssearch my_query_seq.fa my_hit_hsps.fa
>
> Then parse the output. Just have to figure out what exactly do i need
> from the output :)
>
> Thank you again
> D.
>
> Quoting Jason Stajich <jason.stajich at gmail.com>:
>
>> Assuming 1 HSP you can do:
>>
>> my $qcov = $hsp->query->length / $query->length;
>>
>> Otherwise you have to add up the HSPs and their coverage (and remove
>> redundancy of overlapping hits) to determine what parts of the query
>> were aligned across all the HSPs that were found. You can try the
>> tile_hsps function to also get this. Or if you are using WUBLAST you
>> can get the -links option to grab the logical groups of
>> non-overlapping HSPs to tile across.
>>
>> Or use FASTA or SSEARCH which will provide a single longest path
>> through the alignment.
>>
>>
>> On Oct 29, 2013, at 10:12 PM, Florent Angly <florent.angly at gmail.com>
>> wrote:
>>
>>> Hi Dimitar,
>>>
>>> I suppose you are using BLAST+, not legacy BLAST. There is a HOWTO
>>> here that can get you started:
>>> http://www.bioperl.org/wiki/HOWTO:BlastPlus
>>> Regarding your issue, I had a look at the blastn manual, and I doesn't
>>> look like you can write '-outfmt "7 qcovs ..."' to me. If anything, it
>>> would be '-outfmt "qcovs ..."'.
>>> So, maybe try to get your command to work as intended in a terminal
>>> before putting it in BioPerl.
>>> Best,
>>>
>>> Florent
>>>
>>>
>>> On Wed, Oct 30, 2013 at 1:45 PM,  <dimitark at bii.a-star.edu.sg> wrote:
>>>> Hi guys,
>>>> i was wondering how can i get the query coverage when blasting with
>>>> Bioperl.
>>>> So i figure out one way is to specify the outformat to tabular(7)
>>>> and the
>>>> specify that i want the qcovs(query coverage per subject). From
>>>> BLAST manual
>>>> i see one can do: -outfmt "7 qcovs ..." but in Bioperl i can only say:
>>>> -outformat => 7. If i do: outformat => "7 qcovs" i get an error.
>>>>
>>>> Well another option i suppose is to get the lengths of all hsps from
>>>> a hit
>>>> and estimate the query coverage that way but im not exactly sure if
>>>> this is
>>>> correct way.
>>>>
>>>> Any idea how can i get the query coverage?
>>>>
>>>> Thank you very much for your time and any hints
>>>>
>>>> Regards
>>>> Dimitar
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list