[Bioperl-l] SearchIO::blast and -b 0
Brian Desany
bdesany@bcm.tmc.edu
Sat, 23 Mar 2002 16:44:41 -0600
I routinely do BLASTs where I supress output of the alignments in favor of
getting just a list of hits (each with a score and a significance), using
the -b 0 option of NCBI blastall -p blastn. This saves me space, plus I'm
only interested if certain things match each other with a certain
significance - the exact alignment doesn't matter to me.
I found that in Bio::SearchIO::blast, hit instanstiation and hsp
instantiation are tightly coupled and triggered by the presence of an
alignment - if there is no alignment/hsp, then no hit is created in the
result object. I guess this is because in a report there is first a list of
hits and then a list of hsps, and usually the list of hsps is redundant with
the list of hits.
Anyway I made a little change to Bio::SearchIO::blast::next_result which
detects if there have been hits in the absence of alignments, and if so
creates hits in the result (that have no associated hsp).
Here is the patch:
***************
*** 201,206 ****
--- 201,207 ----
my $reporttype;
$self->start_document();
my @hit_signifs;
+ my @hit_descs;
while( defined ($_ = $self->_readline )) {
next if( /^\s+$/); # skip empty lines
next if( /CPU time:/);
***************
*** 252,257 ****
--- 253,259 ----
! /^\s+$/ ) {
my @line = split;
push @hit_signifs, [ pop @line, pop @line];
+ push @hit_descs, [ shift @line, join ' ', @line];
}
} elsif( /Sequences producing High-scoring Segment Pairs:/ ) {
# skip the next line
***************
*** 262,267 ****
--- 264,270 ----
my @line = split;
pop @line; # throw away first number which is for 'N'col
push @hit_signifs, [ pop @line, pop @line];
+ push @hit_descs, [ shift @line, join ' ', @line];
}
} elsif ( /^Database:\s*(.+)$/ ) {
my $db = $1;
***************
*** 374,379 ****
--- 377,410 ----
$self->element({'Name' => 'Hsp_hit-frame',
'Data' => $hitframe});
} elsif( /^Parameters:/ || /^\s+Database:/ ) {
+ unless (defined (@{$self->_eventHandler->{'_hits'}})) {
+ while (@hit_signifs) {
+
+ $self->in_element('hsp') && $self->end_element({'Name' =>
'Hsp'});
+ $self->in_element('hit') && $self->end_element({'Name'
=> 'Hit'});
+
+ $self->start_element({ 'Name' => 'Hit'});
+
+ my $h = shift @hit_descs;
+ $self->element({ 'Name' => 'Hit_id',
+ 'Data' => $h->[0]});
+ my @pieces = split(/\|/,$h->[0]);
+ my $acc = pop @pieces;
+ $self->element({ 'Name' => 'Hit_accession',
+ 'Data' => $acc});
+ $self->element({ 'Name' => 'Hit_def',
+ 'Data' => $h->[1]});
+
+
+ my $v = shift @hit_signifs;
+ if( defined $v ) {
+ $self->element({'Name' => 'Hit_signif',
+ 'Data' => $v->[0]});
+ $self->element({'Name' => 'Hit_score',
+ 'Data' => $v->[1]});
+ }
+ }
+ }
$self->in_element('hsp') && $self->end_element({'Name' =>
'Hsp'});
$self->in_element('hit') && $self->end_element({'Name' =>
'Hit'});
my $blast = ( /Parameters/ ) ? 'wublast' : 'ncbi';
I'd be happy to commit this myself if you want to give me an account,
otherwise do with it what you will :)
Thanks,
-Brian.
--------------------------------------------
Brian A. Desany, Ph.D.
Dictyostelium Genome Sequencing Project
Baylor College of Medicine
bdesany@bcm.tmc.edu
(713)798-5639
http://www.dictygenome.org
--------------------------------------------