[Bioperl-l] about common_name

Jason Stajich jason@cgt.mc.duke.edu
Fri, 22 Nov 2002 07:08:32 -0500 (EST)

We're working on the taxonomy interface to ncbi + Dan's new Taxonomy
module Bio::Taxonomy.

Jason Stajich
Duke University
jason at cgt.mc.duke.edu

On Fri, 22 Nov 2002, Qiang Tu wrote:

> hello all,
> Sorry to bother you.
> I found a problem of Bio::Species. If you load a sequence and want to read
> common name of the species of the sequence, you may use
> $seq->species->common_name. But many sequences do not carry correct
> common names so you can not get the correct names from this method.
> I think it may be solved by query taxnomy database from NCBI and write
> a prototype function. Should we add such a method in Bio::Species?
> thanks.
> run the script on some sequences and the result is:
> ==========
> bname  is: Bos taurus
> cname1 is: Bos taurus (cow)
> cname2 is: cow
> bname  is: Saccharomyces cerevisiae
> cname1 is: Saccharomyces cerevisiae
> cname2 is: baker's yeast
> bname  is: Mus musculus
> cname1 is: Mus musculus
> cname2 is: house mouse
> bname  is: Homo sapiens
> cname1 is: Homo sapiens (human)
> cname2 is: human
> ==========
> and the script is:
> ==========
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::SeqIO;
> use LWP::Simple;
> my $file = shift;
> my $io = Bio::SeqIO->new( '-file' => $file,
>                           '-format' => 'genbank',
>                          );
> my $seq = $io->next_seq;
> my $bname  = $seq->species->binomial;
> my $cname1 = $seq->species->common_name;
> my $cname2 = ncbi_common_name($bname);
> print "bname  is: $bname \n";
> print "cname1 is: $cname1\n";
> print "cname2 is: $cname2\n";
> sub ncbi_common_name {
>     my $bname    = shift or return;
>     my $utils    = "http://www.ncbi.nlm.nih.gov/entrez/eutils";
>     my $esearch  = "$utils/esearch.fcgi?db=taxonomy&term=";
>     my $esummary = "$utils/esummary.fcgi?db=taxonomy&id=";
>     my $countid1 = '<eSearchResult>.*?<Count>';
>     my $countid2 = '</Count>';
>     my $id1      = '<Id>';
>     my $id2      = '</Id>';
>     my $cnameid1 = '<Item.*?CommonName.*?>';
>     my $cnameid2 = '</Item>';
>     $bname =~ s/\s+/+/g;
>     $bname = '"'.$bname.'"';
>     my $esearch_result = get($esearch . $bname) or return;
>     my $count;
>     if ($esearch_result =~ /$countid1(\d+)$countid2/s) {
>         $count = $1;
>     }
>     return if ($count != 1);
>     my $id;
>     if ($esearch_result =~ /$id1(\d+)$id2/) {
>         $id = $1;
>     }
>     return if (!$id);
>     my $esummary_result = get($esummary . $id) or return;
>     my $cname;
>     if ($esummary_result =~ /$cnameid1(.*?)$cnameid2/) {
>         $cname = $1;
>     }
>     return $cname;
> }
> ==========
> Qiang Tu
> Institute of Biochemistry and Cell Biology
> Chinese Academy of Sciences
> Email: tuqiang@mail.shcnc.ac.cn, tuqiang_cn@yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l