[Bioperl-l] Re: bad entries in interpro again

Mikko Arvas Mikko.Arvas at vtt.fi
Thu Dec 2 05:11:44 EST 2004


At 13:49 1.12.2004 -0800, Hilmar Lapp wrote:

>On Wednesday, December 1, 2004, at 08:16  AM, Mikko Arvas wrote:
>>Is the &apos the source of the problem?
>Did you try to take it out and see what happens? I.e., you can answer this 
>yourself easily.
>I would have thought that it's not the problem, but it'd be great if you 
>or somebody else helps out by testing what was suggested.

Sorry about that I should have tested it before mailing. The problem is not 
non-ascii characters it seems to be specifically the combination of two & 
inside individual <>. I tried various combinations and other non-ascii 
characters (even in abundance) don't break it and a single & does neither.

Here is again the problematic line:
<interpro id="IPR002073" name="3&apos;5&apos;-cyclic nucleotide 
phosphodiesterase" type="Domain" parent_id="IPR003607">

And its error:
not well-formed (invalid token) at line 2, column 54, byte 132 
at  /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi/XML/Parser.pm
line 187

So which way to proceed?

>>Is it really a problem in BioPerl or in expat?
>If the problem is outside of Interpro, it's Expat, not Bioperl. It's the 
>XML parser library that threw up.
>>  Is somebody trying to solve the problem for Bioperl now
>>and is there any sensible thing that the interpro team could do to help?
>Depends on where the problem is. It appears that the Interpro team already 
>eliminated the double quotes in names. The is some hard-coded stuff in 
>interpro.pm that needs to be removed,  and I heard Allen say he'll work on 
>         -hilmar


Mikko Arvas
VTT Biotechnology

e-mail:            mikko.arvas at vtt.fi
tel:                 +358-(0)9-456 5827
mobile:           +358-(0)44-381 0502
fax:                +358-(0)9-455 2103
mail:               Tietotie 2, Espoo
                       P.O. Box 1500
                       FIN-02044 VTT, Finland

More information about the Bioperl-l mailing list