[Bioperl-l] Bio::SeqIO::genbank,
Bio::Species - can't get full species name
Matthew Betts
Matthew.Betts at ii.uib.no
Thu May 13 09:35:54 EDT 2004
Hi,
I am trying to reconcile gene trees with species trees, and to do this I
need the species names to be the same in both cases. The gene trees come
from a clustering of GenBank coding sequences, and the species trees come
from the NCBI taxonomy. However, when using BioPerl to extract the species
info from GenBank entries, it only seems possible to get the first
three words from the ORGANISM line, which are treated as genus, species,
and subspecies in Bio::Species. However, in several cases, such as the
example below, there is more information in the ORGANISM line. I suspect
that this means that the subspecies name uses more than one word, or that
the GenBank format is being broken? However, this is also how the names
appear in the NCBI taxonomy names.dmp file.
The problem seems to be in Bio::SeqIO::genbank->_read_GenBank_Species().
There is a special condition there for viruses (the whole of the ORGANISM
info is put on to the classification array), but the examples I have are
for chordates (there may be others).
I'd be really grateful for any comments on the best thing for me to do.
Thanks,
Matthew
LOCUS AY211864 701 bp DNA linear ROD 25-AUG-2003
DEFINITION Tamias amoenus X Tamias ruficaudus RBCM19680 cytochrome b (cytb)
gene, partial cds; mitochondrial gene for mitochondrial product.
ACCESSION AY211864
VERSION AY211864.1 GI:33385214
KEYWORDS .
SOURCE mitochondrion Tamias amoenus X Tamias ruficaudus
ORGANISM Tamias amoenus X Tamias ruficaudus
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Rodentia; Sciurognathi; Sciuridae; Sciurinae;
Tamias.
REFERENCE 1 (bases 1 to 701)
AUTHORS Good,J.M., Demboski,J.R., Nagorsen,D.W. and Sullivan,J.
TITLE Phylogeography and introgressive hybridization: chipmunks (genus
Tamias) in the northern Rocky Mountains
JOURNAL Evolution 57 (8), 1900-1916 (2003)
REFERENCE 2 (bases 1 to 701)
AUTHORS Good,J.M., Demboski,J.R., Nagorsen,D.W. and Sullivan,J.
TITLE Direct Submission
JOURNAL Submitted (08-JAN-2003) Ecology and Evolutionary Biology,
University of Arizona, 1041 E. Lowell Street, Tucson, AZ 85721, USA
FEATURES Location/Qualifiers
source 1..701
/organism="Tamias amoenus X Tamias ruficaudus"
/organelle="mitochondrion"
/mol_type="genomic DNA"
/specimen_voucher="Royal British Columbia Museum
(RBCM19680)"
/db_xref="taxon:231237"
gene 1..>701
/gene="cytb"
CDS 1..>701
/gene="cytb"
/codon_start=1
/transl_table=2
/product="cytochrome b"
/protein_id="AAP45298.1"
/db_xref="GI:33385215"
/translation="MTNIRKTHPLIKIINHSFIDLPAPSNISAWWNFGSLLGICLIIQ
ILTGLFLAMHYTSDTMTAFSSVTHICRDVNYGWLIRYMHANGASMFFICLFLHVGRGL
YYGSYTYFETWNIGVILLFAVMATAFMGYVLPWGQMSFWGATVITNLLSAIPYIGTTL
VEWIWGGFSVDKATLTRFFAFHFILPFIITALVMVHLLFLHETGSNNPSGLISDSDKI
PFHPYYTIKDILGILL"
ORIGIN
1 atgacaaaca tccgcaaaac ccatcccctc attaaaatca ttaaccactc attcattgac
61 ttacccgcac catccaacat ttctgcatga tgaaattttg gatccctctt aggtatttgc
121 ctaattatcc aaattctcac tggactattc ctagcaatac actacacatc cgacacaatg
181 acagctttct catctgtcac tcatatttgc cgagatgtaa actacggctg acttatccga
241 tacatacacg ctaacggagc ctccatattt tttatctgcc tattccttca tgtaggccga
301 ggactttact atggatcata tacctacttc gaaacatgaa acattggagt aattctttta
361 ttcgccgtta tagccactgc atttataggt tacgttctcc catgaggaca gatatccttt
421 tgaggtgcta ctgttattac aaatctccta tcagccatcc catatatcgg aacaacacta
481 gtagaatgaa tctgaggagg cttctcagta gacaaagcca ctctaacacg attctttgca
541 tttcatttta tcctcccatt cattattaca gcattagtta tagttcacct actcttcctt
601 catgaaaccg gatccaataa tccttccgga ttaatctctg actctgataa aattccattc
661 catccatatt acactattaa agatatccta ggcatcctcc t
//
More information about the Bioperl-l
mailing list