[Bioperl-l] parsing a BLAST output
Jason Stajich
jason.stajich at duke.edu
Sun Dec 4 23:00:06 EST 2005
frac_identical gives the fraction of bases in the HSP that are
identical.
overlap would be calculated from (length of the query aligned) /
(length of query) or (length of hit aligned) / (length of hit).
So for an HSP you can calculate this
$fracqaligned = $hsp->query->length / $result->query_length;
$frachaligned = $hsp->hit->length / $hit->length;
But remember there may be multiple HSPs so you may need to merge the
HSP information to get the total length aligned. If there is a
repeated domain or multiple high scoring suboptimal alignments this
can cause things to get a little tricky.
There are two methods provided in the HitI interface called
frac_aligned_query() and frac_aligned_hit() do try and take all of
this into account for you, but I admit this is less well tested
code. But do give them a try:
$fracqaligned = $hit->frac_aligned_query();
$frachaligned = $hit->frac_aligned_hit();
If you are using WU-BLASTP add the -postsw option to get a refined
alignment which will merge HSPs where appropriate so you should use
that.
You can also use the -links option to get WU-BLAST to get the logical
ordering and a consistent path through the HSPs.
On Dec 4, 2005, at 8:32 PM, Angshu Kar wrote:
> Hi,
>
> To begin with, I'm new to Bioperl.
> Now, I've written the following simple piece of code to parse a WU-
> Blast
> output which filters data *for a given e-value and >50% overlap*.
>
> I'm writing the main algorithm here:
>
> my $blast_report = $ARG[1];
> my $threshold_evalue = $ARG[2];
>
> my $in = new Bio::SearchIO(-format => 'blast', -file =>
> $blast_report);
>
> while (my $result = $in -> next_result)
> {
> while(my $hit = $result->next_hit)
> {
> if(($line{$hit->name} == $line{$result->query_accession}))
> {
> next;
> }
> if($hit->hsp->evalue <= $threshold_evalue)
> {
> if($hit->hsp->frac_indentical>=0.5)
> {
> print $line{$result->query_accession} . "\t" .
> $line{$hit->name} . "\t" . $hit->hsp-evalue . "\n";
> }
> }
> }
> }
>
> My questions are:
>
> 1. does the frac_identical gives the measure of % overlap? Or, are
> there any
> other methods?
> 2. now, i don't have any blast data sets to test my code upon.could
> any of
> the experienced users let me know whether the algorithm is fine?any
> tip-offs on any point (from optimization to syntactical errors) are
> heartily
> welcome.
> 3. could any one please let me know if i can find sample wu-blast
> outputs to
> test my script upon?
http://fungal.genome.duke.edu/~jes12/BGT203.2005/sample_reports/
Also checkout the biodata repository from bioperl and look in the
DB_Searching directory, we had started a project cataloging example
reports in all the different formats. This sort of fizzled out, but
could still use some volunteers to better organize things and
incorporate more examples.
http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biodata/
DB_Searching/
> Appreciate your guidance.
>
> Thanks,
> Angshu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
More information about the Bioperl-l
mailing list