[Bioperl-l] Re: bad entries in interpro again
Mikko.Arvas at vtt.fi
Thu Dec 2 05:11:44 EST 2004
At 13:49 1.12.2004 -0800, Hilmar Lapp wrote:
>On Wednesday, December 1, 2004, at 08:16 AM, Mikko Arvas wrote:
>>Is the &apos the source of the problem?
>Did you try to take it out and see what happens? I.e., you can answer this
>I would have thought that it's not the problem, but it'd be great if you
>or somebody else helps out by testing what was suggested.
Sorry about that I should have tested it before mailing. The problem is not
non-ascii characters it seems to be specifically the combination of two &
inside individual <>. I tried various combinations and other non-ascii
characters (even in abundance) don't break it and a single & does neither.
Here is again the problematic line:
<interpro id="IPR002073" name="3'5'-cyclic nucleotide
phosphodiesterase" type="Domain" parent_id="IPR003607">
And its error:
not well-formed (invalid token) at line 2, column 54, byte 132
So which way to proceed?
>>Is it really a problem in BioPerl or in expat?
>If the problem is outside of Interpro, it's Expat, not Bioperl. It's the
>XML parser library that threw up.
>> Is somebody trying to solve the problem for Bioperl now
>>and is there any sensible thing that the interpro team could do to help?
>Depends on where the problem is. It appears that the Interpro team already
>eliminated the double quotes in names. The is some hard-coded stuff in
>interpro.pm that needs to be removed, and I heard Allen say he'll work on
e-mail: mikko.arvas at vtt.fi
tel: +358-(0)9-456 5827
mobile: +358-(0)44-381 0502
fax: +358-(0)9-455 2103
mail: Tietotie 2, Espoo
P.O. Box 1500
FIN-02044 VTT, Finland
More information about the Bioperl-l