[Bioperl-l] Bio::Index::GenBank - by organism?

Tue Nov 10 18:50:00 UTC 2009

You might also look at what mygenbank does:
http://homepage.mac.com/iankorf/mygenbank.html

On Nov 9, 2009, at 7:55 PM, Chris Fields wrote:

> On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote:
>
>> Many thanks to Ewan Birney et. al. for Bio::Index::*
>>
>> I can throw away my awful grep based index-by-accession stuff.   :)
>>
>> Any chance someone has also written an organism based index  
>> mechanism? Something like...
>>
>> while (my $seq = $inx−>get_Seq_by_organism('*Xanthomonas*')) {
>>  print $seq->display_id . "\n";
>> }
>>
>> Thanks,
>>
>> j
>
> It should work via id_parser(); from Bio::Index::GenBank:
>
>   $inx->id_parser(\&get_id);
>   # make the index
>   $inx->make_index($file_name);
>
>   # here is where the retrieval key is specified
>   sub get_id {
>      my $line = shift;
>      $line =~ /clone="(\S+)"/;
>      $1;
>   }
>
> Change the code ref deal with the line you want and parse the name  
> out.  Caveat: this may not be absolutely perfect (it only passes in  
> a line at a time, and some species lines will wrap).  Also not sure  
> how this would work in cases where multiple sequences from the same  
> species are present.
>
> The other option is to preparse everything and tie a hash to store a  
> species->UID map, then use that along with your Bio::Index index to  
> grab what you need.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org