[Bioperl-l] genbank species parsing (genbank.pm,v 1.121)

David Waner dwaner at scitegic.com
Fri Mar 24 23:52:57 UTC 2006


The genbank reader in BioPerl 1.5.1 parses the species name of plant
hybrids like "Musa x paradisiaca" as species = "x", subspecies
"paradisiaca".  It would be more useful (and result in more accurate
round tripping) if this were parsed as species = "x paradisiaca", no
subspecies. Perhaps this common special case should be handled in the
genbank.pm module.

- David Waner

Example excerpts:

	Original genbank file:
	SOURCE      Musa x paradisiaca
	  ORGANISM  Musa x paradisiaca
  
	Output from round-trip through BioPerl:
	SOURCE      Musa x paradisiaca paradisiaca
	  ORGANISM  Musa x
	  
Test case:

LOCUS       MSZ85965                 634 bp    DNA     linear   STS
28-FEB-2002
DEFINITION  Musa x paradisiaca DNA for sequence tagged microsatellite
site
            (STMS), sequence tagged site.
ACCESSION   Z85965
VERSION     Z85965.1  GI:2266701
KEYWORDS    STS; microsatellite; STMS.
SOURCE      Musa x paradisiaca
  ORGANISM  Musa x paradisiaca
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta;
Tracheophyta;
            Spermatophyta; Magnoliophyta; Liliopsida; Zingiberales;
Musaceae;
            Musa.
REFERENCE   1
  AUTHORS   Lagoda,P.J.L.
  TITLE     Banana STMS markers
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 634)
  AUTHORS   Lagoda,P.J.L.
  TITLE     Direct Submission
  JOURNAL   Submitted (31-JAN-1997) Lagoda P.J.L., CIRAD BIOTROP,
AGETROP
            laboratory, Avenue du val de Montferrand, BP5035, 34032
Montpellier
            Cedex, France
FEATURES             Location/Qualifiers
     source          1..634
                     /organism="Musa x paradisiaca"
                     /mol_type="genomic DNA"
                     /cultivar="Gobusik"
                     /db_xref="taxon:89151"
                     /clone="pMaCIR561"
                     /cell_line="Madang"
                     /clone_lib="Pst 1"
     primer_bind     79..103
                     /note="Upper Primer AGMI145"
     repeat_region   165..188
                     /note="(TC)16 repeat"
     repeat_region   247..261
                     /note="(TC)6 repeat"
     primer_bind     279..297
                     /note="Lower Primer AGMI146"
ORIGIN      
        1 ctgcaggtaa ctggccgagt tgaacagtac caaccctgtt gtcacgaggc
acataatgac
       61 tagagtaccc tccatccaag ctattacttg tttttatctt gaagacattt
cagtctatnc
      121 aatcataagc atgattgaac cctctcattc gtgaaccgct accctctctc
tctctctctc
      181 tctctctcca gcnacccttt nttngctctg tctaactact ctgtccctct
cttggctctt
      241 gcacactcct ctctctctct ccccagtaat tgaacncctc ctgtcttttn
tgtccttgct
      301 ccctcttctt tccagtcntc atnttatctc tnnctgcana anattgcacc
atttccttac
      361 ttcttagtan tttcagattt ttaaatattt tccaatattg caccaaaatc
ttggctgtct
      421 tattggtcca actagtaatc tgaggcttag taaagtcatt gttcagtttg
agcttgataa
      481 ttatggttcg aatgcttaaa gactagtaaa tctacgggaa gggttacaan
accccataaa
      541 attctagctt atactgnaat aaaaaaacnt cttccaacnt aacanccttt
ccantatctc
      601 tcgggttttt caaaaggatt aaggnnggtg ttcc
//




More information about the Bioperl-l mailing list