[Bioperl-l] bad entries in interpro
Hilmar Lapp
hlapp at gmx.net
Sat Nov 27 01:06:57 EST 2004
On Tuesday, November 23, 2004, at 04:30 PM, Robson Francisco de Souza
{S} wrote:
>
>>> not well-formed (invalid token) at line 2, column 53, byte 131 at
>>> /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm
>>> line 187
>
> Well, I saw no answers on the list, therefore I'm sending the
> problemtic
> entry below:
>
> <protein id="O00408" name="CN2A_HUMAN" length="941"
> crc64="9797609B487FD64E">
> <interpro id="IPR002073" name="3'5'-cyclic nucleotide
> phosphodiesterase" type="Domain" parent_id="IPR003607">
>
> The problem seems to be the "'" annotation at the second line.
Did you try and delete the two ' from the entry and then it passed
fine? Otherwise, the ' is not the problem.
>
> I also tested if an eval clause could be used to bypass such entries
> without crashing a script. The example script below worked fine and
> reported a problem with the entry above without crashing.
This will work as long as you don't need to resume parsing of the block
of text that raised the exception, and if the file pointer is properly
advanced. The way SeqIO::interpro.pm works neither seems to be a
problem.
>
> Would it be too dificult to make interpro.pm able to parse names like
> the one above?
What throws up is the XML parser (expat). There's nothing interpro.pm
can do about this to mitigate it, once it happened. The only course of
help is to prepare the text block to be parsed such that it won't raise
exceptions.
-hilmar
>
> Robson
>
> ##################################################
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SeqIO;
>
> my $in = Bio::SeqIO->new(-file=>$ARGV[0],
> -format=>"interpro");
>
> my $i=1;
> while (1) {
> my $seq;
> eval {
> $seq = $in->next_seq;
> };
> last if (!defined $seq);
> if ($@) { print STDERR "Problem parsing sequence $i..."; next };
> print STDERR $seq->id,"\n";
> print "<=== ",$seq->id,"===>\n";
> foreach my $f ($seq->get_all_SeqFeatures) {
> print $f->gff_string,"\n";
> foreach my $key ($f->annotation->get_all_annotation_keys) {
> foreach my $value ($f->annotation->get_Annotations($key)) {
> print $key,":",$value->as_text,"\n";
> }
> }
> }
> $i++;
> }
>
> exit 0;
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list