[Biopython-dev] [Bug 1948] uniprot release 49/SProt.Record Parser Problem

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon Feb 13 07:43:21 EST 2006


http://bugzilla.open-bio.org/show_bug.cgi?id=1948


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         OS/Version|Linux                       |All




------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2006-02-13 07:43 -------
I'm unclear what you meant in comment 2 Kate.

Your original bug report had the following:

SyntaxError: Line does not start with 'ID':
<HTML LANG="EN">

This suggests that instead of getting a plain text SProt file
(which should start 'ID'), you got an HTML file.

Onre reason for this MIGHT be a temporary problem with the ExPASy
website - returning an error message in HTML.

If you still get the <HTML LANG="EN"> error message, could you
attach the raw HTML to this bug (you could use "print results"
at the Python prompt).

If the HTML problem has gone away on its own (which wouldn't
surprise me if it was a temporary problem with the server) do you
see the problem I talked about in comment 1 of the bug?

I have tried this on both Linux and Windows now, both show the
problem described in comment 1 where the 'DT' lines do not match
what BioPython is expecting.

Quoting your original bug report:
> I see from the release notes that some changes were made to the
> annotation format and suspect this is why the biopython scripts
> are no longer happy?

Yes - this does explain the 'DT' line problem, BioPython will need
to be updated to cope with the new format DT lines:

http://ca.expasy.org/sprot/relnotes/sp_news.html#rel7.0

Quoting:

Changes concerning dates and versions numbers (DT lines)

We changed from showing only the dates corresponding to full UniProtKB releases
in the DT lines to displaying the date of the biweekly release at which an
entry is integrated or updated. We dropped the information concerning the
release number and introduced entry and sequence version numbers in the DT
lines.

The new format of the three DT lines is:

DT   DD-MMM-YYYY, integrated into UniProtKB/database_name.
DT   DD-MMM-YYYY, sequence version version_number.
DT   DD-MMM-YYYY, entry version version_number.

Example for UniProtKB/Swiss-Prot:

DT   01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
DT   15-OCT-2001, sequence version 3.
DT   01-APR-2004, entry version 14.

Example for UniProtKB/TrEMBL:

DT   01-FEB-1999, integrated into UniProtKB/TrEMBL.
DT   15-OCT-2000, sequence version 2.
DT   15-DEC-2004, entry version 5.

The sequence version number of an entry is incremented by one when its amino
acid sequence is modified. The entry version number is incremented by one
whenever any data in the flat file representation of the entry is modified.

We retrofitted the entry and sequence version numbers, as well as all dates,
using archived UniProtKB releases.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list