[Biopython-dev] New: Uniprot XML parser

Peter Cock p.j.a.cock at googlemail.com
Wed Jan 20 17:14:18 UTC 2010


On Wed, Jan 20, 2010 at 4:57 PM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
>>
>> Something I should have mentioned earlier (I forgot this wasn't
>> checked in yet) was feature support in the existing "swiss" plain
>> text parser - hopefully we can get that working nicely as part of
>> this XML work:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2235
>>
>> Peter
>>
>
> I know that the plain text swissprot parser can parse features, but
> last time I checked these features were not included in SeqRecords
> generated by Bio.SeqIO.
> If the two parsers have to report similar results, than the 'swiss'
> format in Bio.SeqIO must reports features too.

Yes, there is an old patch on Bug 2235 to do this:
http://bugzilla.open-bio.org/show_bug.cgi?id=2235

> I made a few changes to the original parser to map data as close as
> possible to the plain text parser (available on github).
>
> However the big issue are going to be the comment field:
> - 1 big string in the plain text parser
> - several annotation fields in the XML parser.
>
> I think that obtaining the same results is going to be difficult.
> It is hard to map the big string to many annotations (very error prone)
> and is also hard to map many annotations to a single string...
>
> Anyhow, unit testing is coming (thanks to Mauro) together with a detailed
> comparison between the two parsed seqrecords.

Great.

Peter



More information about the Biopython-dev mailing list