[Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser Problem

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon Feb 13 08:59:53 EST 2006


http://bugzilla.open-bio.org/show_bug.cgi?id=1948





------- Comment #4 from gould at embl.de  2006-02-13 08:59 -------
(In reply to comment #3)
> I'm unclear what you meant in comment 2 Kate.
> 
> Your original bug report had the following:
> 
> SyntaxError: Line does not start with 'ID':
> <HTML LANG="EN">
> 
> This suggests that instead of getting a plain text SProt file
> (which should start 'ID'), you got an HTML file.
> 
> Onre reason for this MIGHT be a temporary problem with the ExPASy
> website - returning an error message in HTML.
> 
> If you still get the <HTML LANG="EN"> error message, could you
> attach the raw HTML to this bug (you could use "print results"
> at the Python prompt).
> 
> If the HTML problem has gone away on its own (which wouldn't
> surprise me if it was a temporary problem with the server) do you
> see the problem I talked about in comment 1 of the bug?
> 
> I have tried this on both Linux and Windows now, both show the
> problem described in comment 1 where the 'DT' lines do not match
> what BioPython is expecting.
> 
> Quoting your original bug report:
> > I see from the release notes that some changes were made to the
> > annotation format and suspect this is why the biopython scripts
> > are no longer happy?
> 
> Yes - this does explain the 'DT' line problem, BioPython will need
> to be updated to cope with the new format DT lines:
> 
> http://ca.expasy.org/sprot/relnotes/sp_news.html#rel7.0
> 
> Quoting:
> 
> Changes concerning dates and versions numbers (DT lines)
> 
> We changed from showing only the dates corresponding to full UniProtKB releases
> in the DT lines to displaying the date of the biweekly release at which an
> entry is integrated or updated. We dropped the information concerning the
> release number and introduced entry and sequence version numbers in the DT
> lines.
> 
> The new format of the three DT lines is:
> 
> DT   DD-MMM-YYYY, integrated into UniProtKB/database_name.
> DT   DD-MMM-YYYY, sequence version version_number.
> DT   DD-MMM-YYYY, entry version version_number.
> 
> Example for UniProtKB/Swiss-Prot:
> 
> DT   01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
> DT   15-OCT-2001, sequence version 3.
> DT   01-APR-2004, entry version 14.
> 
> Example for UniProtKB/TrEMBL:
> 
> DT   01-FEB-1999, integrated into UniProtKB/TrEMBL.
> DT   15-OCT-2000, sequence version 2.
> DT   15-DEC-2004, entry version 5.
> 
> The sequence version number of an entry is incremented by one when its amino
> acid sequence is modified. The entry version number is incremented by one
> whenever any data in the flat file representation of the entry is modified.
> 
> We retrofitted the entry and sequence version numbers, as well as all dates,
> using archived UniProtKB releases.
> 


Yes, I understand what you are saying now....I'm no longer getting the HTML
file but a plain text SProt file which is not being parsed correctly




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list