[Biopython-dev] Uniprot XML parser on TrEmbl

Peter biopython at maubp.freeserve.co.uk
Wed Nov 24 18:03:03 UTC 2010


Hi Andrea,

I *think* I have fixed the problem with empty names in the UniProt XML
format, without affecting the unit tests, but I don't have the 62GB free to
unpack uniprot_trembl.xml.gz to try it out:

https://github.com/biopython/biopython/commit/bb971b2a7384d42d9a6e4994e59299a90e6cc700

Would you be able to retest the trunk code on that please?

I also changed the handling of the organism host (where present)
in both the UniProt and SwissProt parsers to be more consistent.
I've checked uniprot_sprot.dat still parses, but haven't tried the
much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so
again, would you be able to retest the "swiss" text parser too?

Many thanks,

Peter

P.S. Did you get any reply from UniProt about the apparent error in
the Q2LEH1 record within uniprot_trembl.xml.gz?



More information about the Biopython-dev mailing list