[Bioperl-l] Bio::SeqIO and bad entries in uniprot and interpro
Hilmar Lapp
hlapp at gmx.net
Tue Nov 23 01:29:27 EST 2004
On Monday, November 22, 2004, at 12:58 PM, Jason Stajich wrote:
>> Same goes for interpro:
>>
>> my $infeat = Bio::SeqIO->new('-file' => '<match.xml',
>> '-format' => 'interpro' );
>> while (my $feat = $infeat->next_seq) { store features etc. in here}
>>
>> After happily processing a lot of features it gives:
>> not well-formed (invalid token) at line 2, column 53, byte 131 at
>> /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm
>> line 187
Can you locate the position that raises the error? I have seen error
like this thrown on non-ASCII characters.
>>
>> I guess its no wonder that such big DBs have errors or are out of
>> sync with perl modules etc. and I don't mind losing one seq or
>> feature here or there. The files are rather big so fixing them
>> manually is a bit painful. But I need to somehow get most things
>> processed, is there a way to skip these bad entries or would you have
>> some other smart ideas?
>>
XML::Parser being built on top of expat, there is really no way of
recovering from an XML violation that would let you resume parsing of
the document.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list