[Biopython-dev] [Bug 2353] New: Problem parsing Swissprot (UniProt) files

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Wed Aug 29 08:25:12 UTC 2007


           Summary: Problem parsing Swissprot (UniProt) files
           Product: Biopython
           Version: 1.43
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: ibdeno at gmail.com

I installed biopython-py24-1.43-1001 via fink on an iBook G4.

I have found that parsing a Uniprot database from the archaeon
M.thermoautotrophicum (downloaded from Integr8) using Bio.SwissProt produces
errors. For example, the code (in a file called testing.py):


# reading a SwissProt entry from a file

from Bio.SwissProt import SProt
from sys import *

handle = open(argv[1])
sp = SProt.Iterator(handle, SProt.RecordParser())
record = sp.next()
print record.entry_name
print record.sequence


run as:

python2.4 testing.py 27.M_thermoautotrophicum.dat


Traceback (most recent call last):
  File "testing.py", line 8, in ?
    record = sp.next()
  File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 172, in
    return self._parser.parse(File.StringHandle(data))
  File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 296, in
    self._scanner.feed(handle, self._consumer)
  File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 338, in
    self._scan_record(uhandle, consumer)
  File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 343, in
    fn(self, uhandle, consumer)
  File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 483, in
    self._scan_line('SQ', uhandle, consumer.sequence_header, exactly_one=1)
  File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 365, in
    read_and_call(uhandle, event_fn, start=line_type)
  File "/sw/lib/python2.4/site-packages/Bio/ParserSupport.py", line 300, in
    raise SyntaxError, errmsg
SyntaxError: Line does not start with 'SQ':
PE   3: Inferred from homology;

I have found that this is due to the presence in this file of lines starting
with "PE" (as in the example) or with "**". Once I eliminate these lines, there
is no problem. In my opinion the parser should deal more elegantly with cases
were the records don't have a recognized start...



Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the Biopython-dev mailing list