[Bioperl-l] genbank species parsing (genbank.pm,v 1.121)
David Waner
dwaner at scitegic.com
Fri Mar 24 23:52:57 UTC 2006
The genbank reader in BioPerl 1.5.1 parses the species name of plant
hybrids like "Musa x paradisiaca" as species = "x", subspecies
"paradisiaca". It would be more useful (and result in more accurate
round tripping) if this were parsed as species = "x paradisiaca", no
subspecies. Perhaps this common special case should be handled in the
genbank.pm module.
- David Waner
Example excerpts:
Original genbank file:
SOURCE Musa x paradisiaca
ORGANISM Musa x paradisiaca
Output from round-trip through BioPerl:
SOURCE Musa x paradisiaca paradisiaca
ORGANISM Musa x
Test case:
LOCUS MSZ85965 634 bp DNA linear STS
28-FEB-2002
DEFINITION Musa x paradisiaca DNA for sequence tagged microsatellite
site
(STMS), sequence tagged site.
ACCESSION Z85965
VERSION Z85965.1 GI:2266701
KEYWORDS STS; microsatellite; STMS.
SOURCE Musa x paradisiaca
ORGANISM Musa x paradisiaca
Eukaryota; Viridiplantae; Streptophyta; Embryophyta;
Tracheophyta;
Spermatophyta; Magnoliophyta; Liliopsida; Zingiberales;
Musaceae;
Musa.
REFERENCE 1
AUTHORS Lagoda,P.J.L.
TITLE Banana STMS markers
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 634)
AUTHORS Lagoda,P.J.L.
TITLE Direct Submission
JOURNAL Submitted (31-JAN-1997) Lagoda P.J.L., CIRAD BIOTROP,
AGETROP
laboratory, Avenue du val de Montferrand, BP5035, 34032
Montpellier
Cedex, France
FEATURES Location/Qualifiers
source 1..634
/organism="Musa x paradisiaca"
/mol_type="genomic DNA"
/cultivar="Gobusik"
/db_xref="taxon:89151"
/clone="pMaCIR561"
/cell_line="Madang"
/clone_lib="Pst 1"
primer_bind 79..103
/note="Upper Primer AGMI145"
repeat_region 165..188
/note="(TC)16 repeat"
repeat_region 247..261
/note="(TC)6 repeat"
primer_bind 279..297
/note="Lower Primer AGMI146"
ORIGIN
1 ctgcaggtaa ctggccgagt tgaacagtac caaccctgtt gtcacgaggc
acataatgac
61 tagagtaccc tccatccaag ctattacttg tttttatctt gaagacattt
cagtctatnc
121 aatcataagc atgattgaac cctctcattc gtgaaccgct accctctctc
tctctctctc
181 tctctctcca gcnacccttt nttngctctg tctaactact ctgtccctct
cttggctctt
241 gcacactcct ctctctctct ccccagtaat tgaacncctc ctgtcttttn
tgtccttgct
301 ccctcttctt tccagtcntc atnttatctc tnnctgcana anattgcacc
atttccttac
361 ttcttagtan tttcagattt ttaaatattt tccaatattg caccaaaatc
ttggctgtct
421 tattggtcca actagtaatc tgaggcttag taaagtcatt gttcagtttg
agcttgataa
481 ttatggttcg aatgcttaaa gactagtaaa tctacgggaa gggttacaan
accccataaa
541 attctagctt atactgnaat aaaaaaacnt cttccaacnt aacanccttt
ccantatctc
601 tcgggttttt caaaaggatt aaggnnggtg ttcc
//
More information about the Bioperl-l
mailing list