[Bioperl-l] Re: bad entries in interpro again

Hilmar Lapp hlapp at gnf.org
Thu Dec 2 16:45:52 EST 2004


On Dec 2, 2004, at 6:04 AM, Dave Howorth wrote:

> The file contains many lines identical to the one cited, which are all 
> valid XML in accordance with the Interpro DTD, but none are line 2! So 
> it looks like different data has been passed to XML::Parser.

Well, yes, you can't translate the line# given by the error message 
into line# in the source file. SeqIO::interpro chops up the input at 
<protein>...</protein> and then passes each chunk to the XML::Parser 
instance.

There is no other editing of the chunks going on though except for a 
haphazard substitution of certain double-quotes. In order to see the 
chunk before it gets sent to the parser instance edit 
Bio/SeqIO/interpro.pm and before the line

	  $self->parse_xml($xml_fragment);

put a print statement that prints out the content of $xml_fragment. 
That should also give the exact source XML that trips up the parser.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list