[Bioperl-l] about common_name
Jason Stajich
jason@cgt.mc.duke.edu
Fri, 22 Nov 2002 07:08:32 -0500 (EST)
We're working on the taxonomy interface to ncbi + Dan's new Taxonomy
module Bio::Taxonomy.
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
On Fri, 22 Nov 2002, Qiang Tu wrote:
> hello all,
>
> Sorry to bother you.
>
> I found a problem of Bio::Species. If you load a sequence and want to read
> common name of the species of the sequence, you may use
> $seq->species->common_name. But many sequences do not carry correct
> common names so you can not get the correct names from this method.
> I think it may be solved by query taxnomy database from NCBI and write
> a prototype function. Should we add such a method in Bio::Species?
> thanks.
>
> run the script on some sequences and the result is:
> ==========
> bname is: Bos taurus
> cname1 is: Bos taurus (cow)
> cname2 is: cow
>
> bname is: Saccharomyces cerevisiae
> cname1 is: Saccharomyces cerevisiae
> cname2 is: baker's yeast
>
> bname is: Mus musculus
> cname1 is: Mus musculus
> cname2 is: house mouse
>
> bname is: Homo sapiens
> cname1 is: Homo sapiens (human)
> cname2 is: human
>
> ==========
>
> and the script is:
>
> ==========
> #!/usr/bin/perl
>
> use strict;
> use warnings;
> use Bio::SeqIO;
> use LWP::Simple;
>
> my $file = shift;
> my $io = Bio::SeqIO->new( '-file' => $file,
> '-format' => 'genbank',
> );
> my $seq = $io->next_seq;
> my $bname = $seq->species->binomial;
> my $cname1 = $seq->species->common_name;
> my $cname2 = ncbi_common_name($bname);
>
> print "bname is: $bname \n";
> print "cname1 is: $cname1\n";
> print "cname2 is: $cname2\n";
>
> sub ncbi_common_name {
>
> my $bname = shift or return;
>
> my $utils = "http://www.ncbi.nlm.nih.gov/entrez/eutils";
> my $esearch = "$utils/esearch.fcgi?db=taxonomy&term=";
> my $esummary = "$utils/esummary.fcgi?db=taxonomy&id=";
> my $countid1 = '<eSearchResult>.*?<Count>';
> my $countid2 = '</Count>';
> my $id1 = '<Id>';
> my $id2 = '</Id>';
> my $cnameid1 = '<Item.*?CommonName.*?>';
> my $cnameid2 = '</Item>';
>
> $bname =~ s/\s+/+/g;
> $bname = '"'.$bname.'"';
>
> my $esearch_result = get($esearch . $bname) or return;
>
> my $count;
> if ($esearch_result =~ /$countid1(\d+)$countid2/s) {
> $count = $1;
> }
> return if ($count != 1);
>
> my $id;
> if ($esearch_result =~ /$id1(\d+)$id2/) {
> $id = $1;
> }
> return if (!$id);
>
> my $esummary_result = get($esummary . $id) or return;
>
> my $cname;
> if ($esummary_result =~ /$cnameid1(.*?)$cnameid2/) {
> $cname = $1;
> }
>
> return $cname;
> }
>
>
> ==========
>
>
>
> Qiang Tu
> Institute of Biochemistry and Cell Biology
> Chinese Academy of Sciences
> Email: tuqiang@mail.shcnc.ac.cn, tuqiang_cn@yahoo.com
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>