[Biopython-dev] [Bug 2403] prosite parser can't handle new PROSITE/PRORULE format

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Nov 20 12:30:51 UTC 2007


http://bugzilla.open-bio.org/show_bug.cgi?id=2403





------- Comment #5 from holger.dinkel at gmail.com  2007-11-20 07:30 EST -------
Created an attachment (id=816)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=816&action=view)
fixed version (but with little hack; might not suit everybody)

There are still some errors thrown when scanning the whole prosite_20.dat:

Firstly, the Prosite-Team had also introduced a new field called
"postprocessing", so now the parser chokes on that.  

And secondly the parser breaks at some special comment-lines with authornames
in it of the form "CC /AUTHOR=K_Hofmann; N_Hulo" (Prosite-Acc PS50293): The 
comments are split into columns and then parsed into values at the 
"="-letter. As Mr. Hulo does not have a "/Author=" prepended, an error is
raised...

I was able to fix the first problem straightforward as Peter did and inserted a
postprocessing-entry.

I could also solve the second problem, but only with some hack which might not
suit everybody:

First, i split the "qual, data = [word.lstrip() for word in col.split("=")]"
into two to avoid KeyErrors:
qual = [word.lstrip() for word in col.split("=")][0]
data = ''.join([word.lstrip() for word in col.split("=")][1:]) 

and then i introduced a hack to circumvent the aforementioned problem:

changed
    if qual == '/TAXO-RANGE':

to
    if qual == 'N_Hulo':
        continue
    elif qual == '/TAXO-RANGE':


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list