[Bioperl-l] Bio::SeqIO::genbank, Bio::Species - can't get full species name

Jason Stajich jason at cgt.duhs.duke.edu
Thu May 13 09:50:03 EDT 2004


Does
 my @classification = $species->classification()

contain all of what you want?

-jason
On Thu, 13 May 2004, Matthew Betts wrote:

>
> Hi,
>
> I am trying to reconcile gene trees with species trees, and to do this I
> need the species names to be the same in both cases. The gene trees come
> from a clustering of GenBank coding sequences, and the species trees come
> from the NCBI taxonomy. However, when using BioPerl to extract the species
> info from GenBank entries, it only seems possible to get the first
> three words from the ORGANISM line, which are treated as genus, species,
> and subspecies in Bio::Species. However, in several cases, such as the
> example below, there is more information in the ORGANISM line. I suspect
> that this means that the subspecies name uses more than one word, or that
> the GenBank format is being broken? However, this is also how the names
> appear in the NCBI taxonomy names.dmp file.
>
> The problem seems to be in Bio::SeqIO::genbank->_read_GenBank_Species().
> There is a special condition there for viruses (the whole of the ORGANISM
> info is put on to the classification array), but the examples I have are
> for chordates (there may be others).
>
> I'd be really grateful for any comments on the best thing for me to do.
>
> Thanks,
>
> Matthew
>
>
>
> LOCUS       AY211864                 701 bp    DNA     linear   ROD 25-AUG-2003
> DEFINITION  Tamias amoenus X Tamias ruficaudus RBCM19680 cytochrome b (cytb)
>             gene, partial cds; mitochondrial gene for mitochondrial product.
> ACCESSION   AY211864
> VERSION     AY211864.1  GI:33385214
> KEYWORDS    .
> SOURCE      mitochondrion Tamias amoenus X Tamias ruficaudus
>   ORGANISM  Tamias amoenus X Tamias ruficaudus
>             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
>             Mammalia; Eutheria; Rodentia; Sciurognathi; Sciuridae; Sciurinae;
>             Tamias.
> REFERENCE   1  (bases 1 to 701)
>   AUTHORS   Good,J.M., Demboski,J.R., Nagorsen,D.W. and Sullivan,J.
>   TITLE     Phylogeography and introgressive hybridization: chipmunks (genus
>             Tamias) in the northern Rocky Mountains
>   JOURNAL   Evolution 57 (8), 1900-1916 (2003)
> REFERENCE   2  (bases 1 to 701)
>   AUTHORS   Good,J.M., Demboski,J.R., Nagorsen,D.W. and Sullivan,J.
>   TITLE     Direct Submission
>   JOURNAL   Submitted (08-JAN-2003) Ecology and Evolutionary Biology,
>             University of Arizona, 1041 E. Lowell Street, Tucson, AZ 85721, USA
> FEATURES             Location/Qualifiers
>      source          1..701
>                      /organism="Tamias amoenus X Tamias ruficaudus"
>                      /organelle="mitochondrion"
>                      /mol_type="genomic DNA"
>                      /specimen_voucher="Royal British Columbia Museum
>                      (RBCM19680)"
>                      /db_xref="taxon:231237"
>      gene            1..>701
>                      /gene="cytb"
>      CDS             1..>701
>                      /gene="cytb"
>                      /codon_start=1
>                      /transl_table=2
>                      /product="cytochrome b"
>                      /protein_id="AAP45298.1"
>                      /db_xref="GI:33385215"
>                      /translation="MTNIRKTHPLIKIINHSFIDLPAPSNISAWWNFGSLLGICLIIQ
>                      ILTGLFLAMHYTSDTMTAFSSVTHICRDVNYGWLIRYMHANGASMFFICLFLHVGRGL
>                      YYGSYTYFETWNIGVILLFAVMATAFMGYVLPWGQMSFWGATVITNLLSAIPYIGTTL
>                      VEWIWGGFSVDKATLTRFFAFHFILPFIITALVMVHLLFLHETGSNNPSGLISDSDKI
>                      PFHPYYTIKDILGILL"
> ORIGIN
>         1 atgacaaaca tccgcaaaac ccatcccctc attaaaatca ttaaccactc attcattgac
>        61 ttacccgcac catccaacat ttctgcatga tgaaattttg gatccctctt aggtatttgc
>       121 ctaattatcc aaattctcac tggactattc ctagcaatac actacacatc cgacacaatg
>       181 acagctttct catctgtcac tcatatttgc cgagatgtaa actacggctg acttatccga
>       241 tacatacacg ctaacggagc ctccatattt tttatctgcc tattccttca tgtaggccga
>       301 ggactttact atggatcata tacctacttc gaaacatgaa acattggagt aattctttta
>       361 ttcgccgtta tagccactgc atttataggt tacgttctcc catgaggaca gatatccttt
>       421 tgaggtgcta ctgttattac aaatctccta tcagccatcc catatatcgg aacaacacta
>       481 gtagaatgaa tctgaggagg cttctcagta gacaaagcca ctctaacacg attctttgca
>       541 tttcatttta tcctcccatt cattattaca gcattagtta tagttcacct actcttcctt
>       601 catgaaaccg gatccaataa tccttccgga ttaatctctg actctgataa aattccattc
>       661 catccatatt acactattaa agatatccta ggcatcctcc t
> //
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list