[Bioperl-l] SearchIO::blast and -b 0

Brian Desany bdesany@bcm.tmc.edu
Sat, 23 Mar 2002 16:44:41 -0600


I routinely do BLASTs where I supress output of the alignments in favor of
getting just a list of hits (each with a score and a significance), using
the -b 0 option of NCBI blastall -p blastn. This saves me space, plus I'm
only interested if certain things match each other with a certain
significance - the exact alignment doesn't matter to me.

I found that in Bio::SearchIO::blast, hit instanstiation and hsp
instantiation are tightly coupled and triggered by the presence of an
alignment - if there is no alignment/hsp, then no hit is created in the
result object. I guess this is because in a report there is first a list of
hits and then a list of hsps, and usually the list of hsps is redundant with
the list of hits.

Anyway I made a little change to Bio::SearchIO::blast::next_result which
detects if there have been hits in the absence of alignments, and if so
creates hits in the result (that have no associated hsp).

Here is the patch:


***************
*** 201,206 ****
--- 201,207 ----
     my $reporttype;
     $self->start_document();
     my @hit_signifs;
+    my @hit_descs;
     while( defined ($_ = $self->_readline )) {
         next if( /^\s+$/); # skip empty lines
         next if( /CPU time:/);
***************
*** 252,257 ****
--- 253,259 ----
                  ! /^\s+$/ ) {
               my @line = split;
               push @hit_signifs, [ pop @line, pop @line];
+            push @hit_descs, [ shift @line, join ' ', @line];
           }
         } elsif( /Sequences producing High-scoring Segment Pairs:/ ) {
           # skip the next line
***************
*** 262,267 ****
--- 264,270 ----
               my @line = split;
               pop @line; # throw away first number which is for 'N'col
               push @hit_signifs, [ pop @line, pop @line];
+            push @hit_descs, [ shift @line, join ' ', @line];
           }
         } elsif ( /^Database:\s*(.+)$/ ) {
           my $db = $1;
***************
*** 374,379 ****
--- 377,410 ----
           $self->element({'Name' => 'Hsp_hit-frame',
                           'Data' => $hitframe});
         } elsif(  /^Parameters:/ || /^\s+Database:/ ) {
+            unless (defined (@{$self->_eventHandler->{'_hits'}})) {
+             while (@hit_signifs) {
+
+                 $self->in_element('hsp') && $self->end_element({'Name' =>
'Hsp'});
+                   $self->in_element('hit') && $self->end_element({'Name'
=> 'Hit'});
+
+                 $self->start_element({ 'Name' => 'Hit'});
+
+                 my $h = shift @hit_descs;
+                 $self->element({ 'Name' => 'Hit_id',
+                                  'Data' => $h->[0]});
+                 my @pieces = split(/\|/,$h->[0]);
+                 my $acc = pop @pieces;
+                 $self->element({ 'Name' =>  'Hit_accession',
+                                  'Data'  => $acc});
+                 $self->element({ 'Name' => 'Hit_def',
+                                  'Data' => $h->[1]});
+
+
+                 my $v = shift @hit_signifs;
+                 if( defined $v ) {
+                     $self->element({'Name' => 'Hit_signif',
+                                     'Data' => $v->[0]});
+                     $self->element({'Name' => 'Hit_score',
+                                     'Data' => $v->[1]});
+                 }
+             }
+         }
           $self->in_element('hsp') && $self->end_element({'Name' =>
'Hsp'});
           $self->in_element('hit') && $self->end_element({'Name' =>
'Hit'});
           my $blast = ( /Parameters/ ) ? 'wublast' : 'ncbi';


I'd be happy to commit this myself if you want to give me an account,
otherwise do with it what you will :)

Thanks,
-Brian.
--------------------------------------------
Brian A. Desany, Ph.D.
Dictyostelium Genome Sequencing Project
Baylor College of Medicine
bdesany@bcm.tmc.edu
(713)798-5639
http://www.dictygenome.org
--------------------------------------------