[Biopython-dev] New: Uniprot XML parser

Peter Cock p.j.a.cock at googlemail.com
Fri Jan 15 11:08:32 UTC 2010


On Fri, Jan 15, 2010 at 10:35 AM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
>>
>> However, the comment parsing in the plain text "swiss"
>> format is currently a little simplistic - partly to match
>> what BioPerl did at the time. We can revisit that as
>> part of this work.
>>
>
> the main problem here are going to be the comment fields, that in the
> plain text predictors are parsed as a single string (this pushed me to
> wrote the new parser). I tried to keep comments parsing as simple as it
> can be, by just using lists of strings (good for BioSQL), but many comment
> types would be better parsed with a dictionary tree.

I think BioPerl now uses some kind of nest tree when parsing the
SwissProt comment block, and I would like us to use something
compatible (e.g. a dictionary tree) in the "swiss" parser (and thus
also the XML parser) in such a way that we end up saving this in
BioSQL the same way.

> As of now I left the option to get back the full XML for each comment, by
> calling:
>
> UniprotIO.UniprotIterator(handle,return_raw_comments=True)
>
> so every info in the XML file can be returned and the end user can decide
> how to parse those additional info.
>
> Anyhow I think it is better to discuss this when the unit test
> 'swiss'VS'uniprot' is ready.

+1, good plan.

Peter



More information about the Biopython-dev mailing list