[Biopython-dev] New: Uniprot XML parser

Tue Sep 14 17:58:32 UTC 2010

On Tue, Sep 14, 2010 at 5:22 PM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
>
> Hi Peter,
> I've commented your commits directly on github, basically agreeing with
> them.

Thanks.

> Parsing PDB structures as positional features was done to capture all the
> information in the uniprot file. I do not see any better place than a
> SeqFeature for a positional information, the only option here is to skip it.

We could put the DB cross reference into the dbxrefs list, but that only
captures a tiny part of the data. We could also put it in the annotations,
but that loses the benefits of the position information. Maybe using a
SeqFeature is the best plan...

> I saw in your repository you are using the string "uniprot-xml" to call
> the parser, however the format name at the EBI REST and SOAP services
> is simply "uniprotxml". take a look at:
>
> http://www.ebi.ac.uk/Tools/webservices/services/dbfetch_rest
>
> I think it is better to be conservative in this.

On the other hand, "uniprot-xml" fits well with the idea of "format-variant".
Whatever we go with will have downsides.

> I'm still working on the SeqIO.index to make a faster implementation. RE
> are really slow, and ElementTree should cope well with this task.
> Anyhow it works with the current implementation, so it's not a big deal.

I don't know enough about ElementTree to help right now, sorry.

Peter