[Bioperl-l] about common_name
Qiang Tu
tuqiang@mail.shcnc.ac.cn
Fri, 22 Nov 2002 15:30:31 GMT
hello all,
Sorry to bother you.
I found a problem of Bio::Species. If you load a sequence and want to read
common name of the species of the sequence, you may use
$seq->species->common_name. But many sequences do not carry correct
common names so you can not get the correct names from this method.
I think it may be solved by query taxnomy database from NCBI and write
a prototype function. Should we add such a method in Bio::Species?
thanks.
run the script on some sequences and the result is:
==========
bname is: Bos taurus
cname1 is: Bos taurus (cow)
cname2 is: cow
bname is: Saccharomyces cerevisiae
cname1 is: Saccharomyces cerevisiae
cname2 is: baker's yeast
bname is: Mus musculus
cname1 is: Mus musculus
cname2 is: house mouse
bname is: Homo sapiens
cname1 is: Homo sapiens (human)
cname2 is: human
==========
and the script is:
==========
#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;
use LWP::Simple;
my $file = shift;
my $io = Bio::SeqIO->new( '-file' => $file,
'-format' => 'genbank',
);
my $seq = $io->next_seq;
my $bname = $seq->species->binomial;
my $cname1 = $seq->species->common_name;
my $cname2 = ncbi_common_name($bname);
print "bname is: $bname \n";
print "cname1 is: $cname1\n";
print "cname2 is: $cname2\n";
sub ncbi_common_name {
my $bname = shift or return;
my $utils = "http://www.ncbi.nlm.nih.gov/entrez/eutils";
my $esearch = "$utils/esearch.fcgi?db=taxonomy&term=";
my $esummary = "$utils/esummary.fcgi?db=taxonomy&id=";
my $countid1 = '<eSearchResult>.*?<Count>';
my $countid2 = '</Count>';
my $id1 = '<Id>';
my $id2 = '</Id>';
my $cnameid1 = '<Item.*?CommonName.*?>';
my $cnameid2 = '</Item>';
$bname =~ s/\s+/+/g;
$bname = '"'.$bname.'"';
my $esearch_result = get($esearch . $bname) or return;
my $count;
if ($esearch_result =~ /$countid1(\d+)$countid2/s) {
$count = $1;
}
return if ($count != 1);
my $id;
if ($esearch_result =~ /$id1(\d+)$id2/) {
$id = $1;
}
return if (!$id);
my $esummary_result = get($esummary . $id) or return;
my $cname;
if ($esummary_result =~ /$cnameid1(.*?)$cnameid2/) {
$cname = $1;
}
return $cname;
}
==========
Qiang Tu
Institute of Biochemistry and Cell Biology
Chinese Academy of Sciences
Email: tuqiang@mail.shcnc.ac.cn, tuqiang_cn@yahoo.com