[Biopython-dev] Merging Uniprot XML parser?

Peter biopython at maubp.freeserve.co.uk
Tue Oct 19 15:54:43 UTC 2010


Hi all,

I've fixed a few issues I felt were holding up merging Andrea's UniProt
XML parser.

I've now tested the uniprot_sprot.txt and uniprot_sprot.xml are parsed
into more or less equivalent objects, and that these can be written out
as GenBank (well, GenPept) files or as EMBL/IMGT files (given recent
work to support protein EMBL files - which do exist but are rarely used).

This required "fixing" Bug 3026 to cope with long annotation that cannot
be line wrapper nicely (lots of long URL strings in UniProt XML comments).
http://bugzilla.open-bio.org/show_bug.cgi?id=3026
I'm tempted to remove the warning because it is so common... or make
it use the same text each time so you get warned once.

There are also some additions to the Bio.SeqFeature position classes,
since SwissProt/UniProt files can have uncertain positions.

Could someone take a look at the code here (a rebased branch), as I'd
like some independent testing (and better yet, code review):
http://github.com/peterjc/biopython/tree/uniprot

Thanks,

Peter



More information about the Biopython-dev mailing list