[Bioperl-l] Re: bad entries in interpro again
Mikko Arvas
Mikko.Arvas at vtt.fi
Thu Dec 2 05:11:44 EST 2004
Hi,
At 13:49 1.12.2004 -0800, Hilmar Lapp wrote:
>On Wednesday, December 1, 2004, at 08:16 AM, Mikko Arvas wrote:
>
>>Is the &apos the source of the problem?
>Did you try to take it out and see what happens? I.e., you can answer this
>yourself easily.
>I would have thought that it's not the problem, but it'd be great if you
>or somebody else helps out by testing what was suggested.
Sorry about that I should have tested it before mailing. The problem is not
non-ascii characters it seems to be specifically the combination of two &
inside individual <>. I tried various combinations and other non-ascii
characters (even in abundance) don't break it and a single & does neither.
Here is again the problematic line:
<interpro id="IPR002073" name="3'5'-cyclic nucleotide
phosphodiesterase" type="Domain" parent_id="IPR003607">
And its error:
not well-formed (invalid token) at line 2, column 54, byte 132
at /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi/XML/Parser.pm
line 187
So which way to proceed?
>>Is it really a problem in BioPerl or in expat?
>
>If the problem is outside of Interpro, it's Expat, not Bioperl. It's the
>XML parser library that threw up.
>
>> Is somebody trying to solve the problem for Bioperl now
>>and is there any sensible thing that the interpro team could do to help?
>
>Depends on where the problem is. It appears that the Interpro team already
>eliminated the double quotes in names. The is some hard-coded stuff in
>interpro.pm that needs to be removed, and I heard Allen say he'll work on
>that.
>
> -hilmar
Cheers,
mikko
Mikko Arvas
VTT Biotechnology
e-mail: mikko.arvas at vtt.fi
tel: +358-(0)9-456 5827
mobile: +358-(0)44-381 0502
fax: +358-(0)9-455 2103
mail: Tietotie 2, Espoo
P.O. Box 1500
FIN-02044 VTT, Finland
More information about the Bioperl-l
mailing list