[Bioperl-l] help using SEARCH IO to parse blast results

Smithies, Russell Russell.Smithies at agresearch.co.nz
Tue Nov 27 19:31:29 UTC 2007


Do the hits need to be sorted first or is this done automagicly?
I ask this as I know Megablast doesn't provide sorted output for most of
it's formats.

Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Dave Messina
> Sent: Wednesday, 28 November 2007 6:56 a.m.
> To: alison waller
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
> 
> Hi Alison,
> As Sendu mentioned, the key bit is adding a condition to the hit loop
to
> limit the number of hits that are printed. I didn't test the below
> extensively, but give it a try...
> 
> 
> Dave
> 
> 
> 
> #!/usr/local/bin/perl -w
> 
> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
> # alison waller November 2007
> 
> use strict;
> use warnings;
> use Bio::SearchIO;
> 
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
hits>\n";
> if (@ARGV != 2) { die $usage; }
> 
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                         # to report for each query
> 
> #open(  IN, $infile )     || die "Can't open inputfile $infile! $!\n";
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
$!\n";
> 
> my $report = new Bio::SearchIO(
>     -file   => "$infile",
>     -format => "blast"
> );
> 
> print OUT
>
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga
ps\t
> Qstrand\tHstrand\n";
> 
> # Go through BLAST reports one by one
> while ( my $result = $report->next_result ) {
>     my $i = 0;
>     while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>         while ( my $hsp = $hit->next_hsp ) {
> 
>             # Print some tab-delimited data about this hit
>             print OUT $result->query_name,     "\t";
>             print OUT $hit->name,              "\t";
>             print OUT $hit->significance,      "\t";
>             print OUT $hit->bits,              "\t";
>             print OUT $hsp->evalue,            "\t";
>             print OUT $hsp->percent_identity,  "\t";
>             print OUT $hsp->length('total'),   "\t";
>             print OUT $hsp->num_identical,     "\t";
>             print OUT $hsp->gaps,              "\t";
>             print OUT $hsp->strand('query'),   "\t";
>             print OUT $hsp->strand('hit'),     "\n";
>         }
>     }
> 
>     if ($i == 0) { print OUT "no hits\n"; }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list