[Bioperl-l] parsing a BLAST output

Angshu Kar angshu96 at gmail.com
Wed Dec 14 16:29:43 EST 2005


Hi Jason,

So again, what I want is the following:

query sequence
----------------------------------
        |   alignment    |
--------------------------------
hit sequence

The length of the alignment should be at least 50% of the
length of the query sequence, and at least 50% of the length
of the hit sequence.

Now I try to convert it to bioperl language:


   if($hit->length_aln()/hit->length()>=0.5 and
$hit->length_aln()/result->query_length()>=0.5)
                    {
 print $line{$result->query_accession} . "\t" . $line{$hit->name} .
"\t" . $hit->hsp-evalue . "\n";
                    }

Is the statement after "if" correctly represent what I mean?

Appreciate your guidance.

Thanks,
Angshu



On 12/13/05, Angshu Kar <angshu96 at gmail.com> wrote:
> Thanks Jason...
>
>
>
> On 12/13/05, Jason Stajich <jason.stajich at duke.edu> wrote:
> >
> > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/Hit/GenericHit.html#POD30
> >
> >
> > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/Hit/GenericHit.html#POD31
> >
> > On Dec 13, 2005, at 10:41 AM, Angshu Kar wrote:
> >
> >
> > Hi Jason,
> >
> > Could you please send in the link where I can find something about the HitI interface? I couldn't find those two functions there.
> >
> > Thanks,
> > Angshu
> >
> >
> > On 12/4/05, Jason Stajich <jason.stajich at duke.edu > wrote:
> > > frac_identical gives the fraction of bases in the HSP that are
> > > identical.
> > >
> > > overlap would be calculated from (length of the query aligned) /
> > > (length of query) or (length of hit aligned) / (length of hit).
> > >
> > > So for an HSP you can calculate this
> > > $fracqaligned = $hsp->query->length / $result->query_length;
> > > $frachaligned = $hsp->hit->length / $hit->length;
> > >
> > > But remember there may be multiple HSPs so you may need to merge the
> > > HSP information to get the total length aligned.  If there is a
> > > repeated domain or multiple high scoring suboptimal alignments this
> > > can cause things to get a little tricky.
> > >
> > > There are two methods provided in the HitI interface called
> > > frac_aligned_query() and frac_aligned_hit() do try and take all of
> > > this into account for you, but I admit this is less well tested
> > > code.  But do give them a try:
> > > $fracqaligned = $hit->frac_aligned_query();
> > > $frachaligned = $hit->frac_aligned_hit();
> > >
> > >
> > > If you are using WU-BLASTP add the -postsw option to get a refined
> > > alignment which will merge HSPs where appropriate so you should use
> > > that.
> > >
> > > You can also use the -links option to get WU-BLAST to get the logical
> > > ordering and a consistent path through the HSPs.
> > >
> > >
> > > On Dec 4, 2005, at 8:32 PM, Angshu Kar wrote:
> > >
> > > > Hi,
> > > >
> > >  > To begin with, I'm new to Bioperl.
> > > > Now, I've written the following simple piece of code to parse a WU-
> > > > Blast
> > > > output which filters data *for a given e-value and >50% overlap*.
> > > >
> > > > I'm writing the main algorithm here:
> > > >
> > > > my $blast_report = $ARG[1];
> > > > my $threshold_evalue = $ARG[2];
> > > >
> > > > my $in = new Bio::SearchIO(-format => 'blast', -file =>
> > > > $blast_report);
> > > >
> > > > while (my $result = $in -> next_result)
> > > >    {
> > > >       while(my $hit = $result->next_hit)
> > > >          {
> > > >             if(($line{$hit->name} == $line{$result->query_accession}))
> > > >                {
> > > >                   next;
> > > >                }
> > > >             if($hit->hsp->evalue <= $threshold_evalue)
> > > >                {
> > > >                   if($hit->hsp->frac_indentical>=0.5)
> > > >                      {
> > > >                         print $line{$result->query_accession} . "\t" .
> > > > $line{$hit->name} . "\t" . $hit->hsp-evalue . "\n";
> > > >                     }
> > > >               }
> > > >       }
> > > > }
> > > >
> > > > My questions are:
> > > >
> > > > 1. does the frac_identical gives the measure of % overlap? Or, are
> > > > there any
> > > > other methods?
> > > > 2. now, i don't have any blast data sets to test my code  upon.could
> > > > any of
> > > > the experienced users let me know whether the algorithm is fine?any
> > > > tip-offs on any point (from optimization to syntactical errors) are
> > > > heartily
> > > > welcome.
> > > > 3. could any one please let me know if i can find sample wu-blast
> > > > outputs to
> > > > test my script upon?
> > >
> > > http://fungal.genome.duke.edu/~jes12/BGT203.2005/sample_reports/
> > >
> > > Also checkout the biodata repository from bioperl and look in the
> > > DB_Searching directory, we had started a  project cataloging example
> > > reports in all the different formats.  This sort of fizzled out, but
> > > could still use some volunteers to better organize things and
> > > incorporate more examples.
> > >
> > >  http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biodata/
> > > DB_Searching/
> > >
> > >
> > > > Appreciate your guidance.
> > > >
> > > > Thanks,
> > > > Angshu
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at portal.open-bio.org
> > > >  http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Jason Stajich
> > > Duke University
> > > http://www.duke.edu/~jes12
> > >
> > >
> > >
> >
> >
> >
> >
> >
> > --
> > Jason Stajich
> > Duke University
> > http://www.duke.edu/~jes12
> >
> >
>
>



More information about the Bioperl-l mailing list