[Bioperl-l] Bio::SearchIO parsing of WuBLASTX reports
Mike Croning
mdr@sanger.ac.uk
Thu, 18 Apr 2002 16:45:45 +0100 (BST)
Hi Guys
I am trying to use SearchIO to parse WuBLASTX results, and wonder if I am
doing something obviously wrong. The problem is that the $hit->next_hsp
method often returns 0 hsps for a hit, when they are clearly there in the
output. There seems no obvious (to me) relationship to score, or other
properties of the match.
Here is the code:
sub parse_WuBLASTX_results2 {
my ($fh, $hit_hash_ref) = @_;
my $searchio;
if (defined($fh)) {
$searchio = new Bio::SearchIO('-format' => 'blast', -fh => $fh);
print "searchio: ", ref($searchio), "\n";
} else {
return;
}
my $blast = $searchio->next_result;
print "blast: ", ref($blast), "\n";
my $query_length = $blast->query_length;
print "Query name : ", $blast->query_name, "\n";
print "Query length: ", $blast->query_length, "\n";
print $blast->query_description, "\n";
print "\n\n";
while (my $hit = $blast->next_hit) {
my $hsp_counter = 0;
my $match_pos_string = '0' x $query_length;
my @fields = split(/\s+/, $hit->description);
my $Hit_accession = $fields[0];
$Hit_accession =~ s/\.\d+//;
my $total_score = 0;
my @hsps;
print $Hit_accession, " ";
while (my $hsp = $hit->next_hsp) {
$hsp_counter++;
print "Strand: ";
print $hsp->strand;
push(@hsps, $hsp);
$total_score += $hsp->score;
}
print " HSPs: $hsp_counter\n";
unless (length($match_pos_string) == $query_length) {
warn "Match pos string length has changed\n";
}
#$$hit_hash{'accession'} = [a Bio::Search::Hit::GenericHit, total
score, match_pos_string, \@hsps];
$$hit_hash_ref{$Hit_accession} = [$hit, $total_score,
$match_pos_string, \@hsps];
}
return 1;
}
And the output:
searchio: Bio::SearchIO::blast
blast: Bio::Search::Result::GenericResult
Query name : TNeu_22_29_1
Query length: 673
P19945 HSPs: 0
P14869 HSPs: 0
P05388 HSPs: 0
Q9BVK4 HSPs: 0
P47826 HSPs: 0
Q9PV90 HSPs: 0
Q95140 HSPs: 0
Q96FQ9 HSPs: 0
Q8WQJ2 HSPs: 0
Q9U3U0 HSPs: 0
Q93572 HSPs: 0
P19889 Strand: 1Strand: 1 HSPs: 2
Q96TJ5 HSPs: 0
Q9NHP0 Strand: 1Strand: 1 HSPs: 2
P05317 HSPs: 0
Q9C3Z6 HSPs: 0
<snip>
The file parses fine with BPlite.
Thanks
Mike Croning
'Grunt Programmer'
Vertebrate Sequence Analysis
The Wellcome Trust Sanger Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244
Fax: +44 (0)1223 494919