[Bioperl-l] Bio::Index::GenBank - by organism?

Chris Fields cjfields at illinois.edu
Tue Nov 10 03:55:01 UTC 2009


On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote:

> Many thanks to Ewan Birney et. al. for Bio::Index::*
>
> I can throw away my awful grep based index-by-accession stuff.   :)
>
> Any chance someone has also written an organism based index  
> mechanism? Something like...
>
> while (my $seq = $inx−>get_Seq_by_organism('*Xanthomonas*')) {
>   print $seq->display_id . "\n";
> }
>
> Thanks,
>
> j

It should work via id_parser(); from Bio::Index::GenBank:

    $inx->id_parser(\&get_id);
    # make the index
    $inx->make_index($file_name);

    # here is where the retrieval key is specified
    sub get_id {
       my $line = shift;
       $line =~ /clone="(\S+)"/;
       $1;
    }

Change the code ref deal with the line you want and parse the name  
out.  Caveat: this may not be absolutely perfect (it only passes in a  
line at a time, and some species lines will wrap).  Also not sure how  
this would work in cases where multiple sequences from the same  
species are present.

The other option is to preparse everything and tie a hash to store a  
species->UID map, then use that along with your Bio::Index index to  
grab what you need.

chris



More information about the Bioperl-l mailing list