[Bioperl-l] parsing blast to GFF

Charles Hauser chauser@duke.edu
21 Nov 2002 17:13:51 -0500


All,

Trying to learn how to use bioperl and at the same time parse blast data
to GFF.

I can't quite get my head around how to output ONLY the features I want.

Modifying the script Jason posted a couple of days ago (see below) I
generate the following output:

20021010.8247.2 BLASTX_sc	similarity	825	1145	.	+	.	Target "Protein:NR_SC:PIR-S55962"

Questions:

(1)  I can't seem to recover the frame,If I uncomment the line:
	-add_tag_value => {'Frame', $hsp->frame()}
I get the error: 
Can't locate object method "report_type" via package
"Bio::Search::HSP::GenericHSP" (perhaps you forgot to load
"Bio::Search::HSP::GenericHSP"?) at
/usr/lib/perl5/site_perl/5.6.1/Bio/Search/HSP/GenericHSP.pm line 633,
<FH> line 134.

(2) none of the 'add_tag_value' features are printing?

(3) not clear what the decimal in: '	.	+	.	' represent.  The '+' is
reading frame, I believe.


regards,

Charles


#!/usr/bin/perl 
use strict;
use warnings;
use Bio::SearchIO;
use Bio::Tools::GFF;
use Math::BigFloat;

my $filename = shift;
if( $filename =~ /\.gz/ ) {
    open(FH, "zcat $filename |") || die("cannot run zcat on $filename:
$!");
} else {
    open(FH, $filename) || die("cannot open $filename: $!");
}
my $in = new Bio::SearchIO(-fh => \*FH,
                           -format => 'blast');
my $out = new Bio::Tools::GFF(-file => ">nr_blastx.gff");

my $evalue_cutoff = Math::BigFloat->new('1e-15');
while( my $r = $in->next_result ) {
    while( my $hit = $r->next_hit ) {
        while( my $hsp = $hit->next_hsp ) {
            my $evalue = $hsp->evalue;
	    # deal with time when NCBI blast doesn't report 1e-200 but e-200
            $evalue =~ s/^e/1e/;
            next if Math::BigFloat->new($evalue) > $evalue_cutoff;
	    my $feature = new Bio::SeqFeature::Generic(
						       -seqname => $r->query_name(),
						       -source => 'BLASTX_sc',
						       -primary => 'similarity',
						       -start => $hsp->query->start,
						       -end   => $hsp->query->end,
						       -strand => $hsp->query->strand,
						       -tag => {'Target', sprintf("Protein:%s",$hit->accession || $hit->name)},
						       -add_tag_value => {'Target', sprintf("%.2f",$hsp->percent_identity)},
						       -add_tag_value => {'Target', $hsp->hit->start},
						       -add_tag_value => {'Target', $hsp->hit->end},
#			                               -add_tag_value => {'Frame', $hsp->frame()},
						       -add_tag_value => {'note', $hit->description}
						       );
	    $out->write_feature($feature);
        }
    }
}