[Biopython-dev] [Bug 1942] New: GenBank RecordParser fails on particular qualifier structure

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Feb 3 04:44:54 EST 2006


http://bugzilla.open-bio.org/show_bug.cgi?id=1942

           Summary: GenBank RecordParser fails on particular qualifier
                    structure
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


When parsing some GenBank record files, the GenBank.RecordParser throws an
error at a (poorly-formatted) qualifier entry:

Python 2.3.4 (#1, Feb  2 2005, 12:11:53)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio.GenBank import RecordParser
>>> parser = RecordParser()
>>> record = parser.parse(file('NC_002758.gbk'))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py", line 240, in
parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py", line 1533,
in feed
    assert line[0:1]=='/', \
AssertionError: Expected start of new qualifier, not:
similar to bacteriophage terminase small subunit"

This problem has been observed for several GenBank .gbk files, including
NC_002758 above, and NC_002929.  It appears to be caused by qualifiers
structured like /note in the following example:

     CDS             878043..878612
                     /locus_tag="SAV0800"
                     /note="
                     similar to bacteriophage terminase small subunit"
                     /codon_start=1
                     /transl_table=11
                     /product="similar to bacteriophage terminase small
                     subunit"
                     /protein_id="NP_371324.1"
                     /db_xref="GI:15923790"
                     /db_xref="GeneID:1120775"
                     /translation="MSELTAKQARFVNEYIRTLNVTQSAIKAGYSANSAHVTGCRLLK
                     KPHIKQYIQEQKDKIIDENVLTAKELLHVLTNAAVGDETETKEVVVKRGEYKENPQSG
                     KVQLVYNEHVELIEVPIKPSDRLKARDMLGKYHKLFTDKHDINGNVPIFINIGEWDGD
                     DEELDKTVKDVSNANPNHTVIVDDIPLED"

where the first double-quotes in the qualifier value are directly followed by
'\n', and the description continues on the next line.  Editing the source .gbk
file directly to remove this resolves the problem.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list