[Bioperl-l] SearchIO - Stop throwing away data

Chris Fields cjfields at uiuc.edu
Mon Jul 24 15:48:45 UTC 2006


> Hi
> 
> I developing someone
> elses work. I wondered whether anyone could identify the
> mistake that the previous coder made?
> I am not very familiar with SearchIO yet.
> 
> They are trying to extract filenames from an output report.
> This is their code:
> 
>      # store the query name of the mito db blast hits into an array
>      my $searchio = new Bio::SearchIO( -file   => $blast_mito_output );
>      # array to store the mitochondrial BLAST database hits
>      my @mito_hits;
>      # name of query for BLAST hit
>      my $query_name;
> 

Just as a gripe here: you should always designate the '-format' here to be
'blast' for BLAST  text output.  

my $searchio = new Bio::SearchIO(-file   => $blast_mito_output,
                                 -format => 'blast' );


The default is still text, so the above works, but that very well may change
in the future.

Each BLAST report is a Result.  Each Result contains one or more hits; each
hit contains one or more HSPs.  SearchIO only parses the information
contained in the BLAST report (i.e. no filenames).  From here, it looks like
you want Hit information, though.  The code below copies the query_name from
the BlastResult object, $result (i.e. the name of your query sequence, the
one you submitted for BLAST'ing against a database).  You need the BlastHit
data from $hit. 

Change :

       $query_name = $result->query_name();
       #print "\nQuery $query_name\n";
       push(@mito_hits, $query_name);

To :

       $hit_name = $hit->description();
       #print "\nHit $hit_name\n";
       push(@mito_hits, $hit_name);

or, for the hit accession, use 

       $hit_name = $hit->accession();

For all accessions in the description (there may be multiples if sequences
are identical), use an array and

       @hit_name = $hit->get_all_accessions();

You can use a different EventHandler if you want to speed things up:

my $searchio = new Bio::SearchIO(-format => $format, -file => $file);

$searchio->attach_EventHandler(Bio::SearchIO::FastHitEventBuilder->new);

But to have this work you need to update to the latest CVS version of
bioperl; this was a recent bug that was fixed. 

Chris

> while ( my $result = $searchio->next_result() ) {
>     # get the hits and their associated name
>     # do not want to include these in the clustering step
>     while( my $hit = $result->next_hit ) {
>       # store the names of these hits into an array
>       # these filenames will not be copied over
>       $query_name = $result->query_name();
>       #print "\nQuery $query_name\n";
>       push(@mito_hits, $query_name);
>     }
> }


> I think they have based it on the code at
> http://www.bioperl.org/wiki/HOWTO:SearchIO#Authors
> 
> use Bio::SearchIO;
> use Bio::SearchIO::FastHitEventBuilder;
> my $searchio = new Bio::SearchIO(-format => $format, -file => $file);
> 
> $searchio->attach_EventHandler(Bio::SearchIO::FastHitEventBuilder->new);
> while( my $r = $searchio->next_result ) {
> while( my $h = $r->next_hit ) {
>   #   Hits will NOT have HSPs
>   print $h->significance,"\n";
> }
> 
> which "throws away data you don't want"???
> 
> I am finding that our code is finding the last file name in the ouput
> report,
> rather than each and every one. I suspect it is overwriting (or throwing
> away the data).
> 
> How do I need to change the code to make sure *every* file name goes
> into @mito_hits?
> 
> Thankyou
> 
> Jayne
> 
> _________________________________________________________________
> The new MSN Search Toolbar now includes Desktop search!
> http://join.msn.com/toolbar/overview
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list