[Biopython-dev] bug: swissprot record parser errors

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Mon Aug 19 14:08:54 EDT 2002


On Mon, 19 Aug 2002, louis_coilliot wrote:

> This little program:
>
>
>
> #!/usr/bin/env python
> # reading a SwissProt entry from a file
>
> from Bio.SwissProt import SProt
> from sys import *
>
> try:
>     handle = open(argv[1])
>     sp = SProt.Iterator(handle, SProt.RecordParser())
>     record = sp.next()
>     print record.entry_name
>     print
> except:
>     print "error"
>
>
>
> doesn't work with some records, for example:
> http://www.expasy.ch/cgi-bin/get-sprot-raw.pl?O75398
> http://www.expasy.ch/cgi-bin/get-sprot-raw.pl?P41964
>
> I don't know why. Any idea ?


I see the following error message when trying your test program on
075398, using BioPython 1.0:


###
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/tmp/python-26514l6q", line 10, in getRecordName
  File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 168, in next
    return self._parser.parse(File.StringHandle(data))
  File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 289, in parse
    self._scanner.feed(handle, self._consumer)
  File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 332, in feed
    self._scan_record(uhandle, consumer)
  File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 337, in _scan_record
    fn(self, uhandle, consumer)
  File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 411, in _scan_reference
    self._scan_ra(uhandle, consumer)
  File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 433, in _scan_ra
    one_or_more=1)
  File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/SwissProt/SProt.py",
line 359, in _scan_line
    read_and_call(uhandle, event_fn, start=line_type)
  File
"/opt/Python-2.1.1/lib/python2.1/site-packages/Bio/ParserSupport.py", line
326, in read_and_call
    raise SyntaxError, errmsg
SyntaxError: Line does not start with 'RA':
RP   LYS-304.
###


Record 075398 has two RP lines in it's first reference:

###
RN   [1]
RP   SEQUENCE FROM N.A. (ISOFORMS 1 AND 3), AND MUTAGENESIS OF ARG-302 AND
RP   LYS-304.
###

I'm staring at SProt.py's parser now to see how it handles consecutive
RP's.  Hmmm.... ah!


This problem has been fixed in CVS already.  Before, the parser tried
scanning RP's like this:


### Biopython 1.0,
    def _scan_rp(self, uhandle, consumer):
        self._scan_line('RP', uhandle, consumer.reference_position,
                        exactly_one=1)
###



but in CVS, this has been corrected to:

### Biopython CVS
    def _scan_rp(self, uhandle, consumer):
        self._scan_line('RP', uhandle, consumer.reference_position,
                        one_or_more=1)
###

to account for multiple RP lines.  Try checking BioPython out from CVS:
your program should work then.



Good luck to you!




More information about the Biopython-dev mailing list