[Bioperl-l] Bio::Index::GenBank - by organism?
Chris Fields
cjfields at illinois.edu
Tue Nov 10 03:55:01 UTC 2009
On Nov 9, 2009, at 6:05 PM, Jay Hannah wrote:
> Many thanks to Ewan Birney et. al. for Bio::Index::*
>
> I can throw away my awful grep based index-by-accession stuff. :)
>
> Any chance someone has also written an organism based index
> mechanism? Something like...
>
> while (my $seq = $inx−>get_Seq_by_organism('*Xanthomonas*')) {
> print $seq->display_id . "\n";
> }
>
> Thanks,
>
> j
It should work via id_parser(); from Bio::Index::GenBank:
$inx->id_parser(\&get_id);
# make the index
$inx->make_index($file_name);
# here is where the retrieval key is specified
sub get_id {
my $line = shift;
$line =~ /clone="(\S+)"/;
$1;
}
Change the code ref deal with the line you want and parse the name
out. Caveat: this may not be absolutely perfect (it only passes in a
line at a time, and some species lines will wrap). Also not sure how
this would work in cases where multiple sequences from the same
species are present.
The other option is to preparse everything and tie a hash to store a
species->UID map, then use that along with your Bio::Index index to
grab what you need.
chris
More information about the Bioperl-l
mailing list