[Bioperl-l] bad entries in interpro

Allen Day allenday at ucla.edu
Thu Nov 25 05:44:09 EST 2004


i'm not ignoring you -- i'll get back to you on this soon.

-allen


On Tue, 23 Nov 2004, Robson Francisco de Souza {S} wrote:

> Hi everyone,
> 
> A few days ago, Mikko Arvas sent an e-mail to this list asking how to
> ignore bad entries in the matches.xml file from the InterPro database.
> Hilmar Lapp answered asking him to locate the position in the file that
> raises the error message 
> 
> >> not well-formed (invalid token) at line 2, column 53, byte 131 at 
> >> /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm 
> >> line 187
> 
> Well, I saw no answers on the list, therefore I'm sending the problemtic
> entry below:
> 
> <protein id="O00408" name="CN2A_HUMAN" length="941" 
>  crc64="9797609B487FD64E">
>     <interpro id="IPR002073" name="3&apos;5&apos;-cyclic nucleotide
>     phosphodiesterase" type="Domain" parent_id="IPR003607">
> 
> The problem seems to be the "&apos;" annotation at the second line.
> 
> I also tested if an eval clause could be used to bypass such entries
> without crashing a script. The example script below worked fine and
> reported a problem with the entry above without crashing.
> 
> Would it be too dificult to make interpro.pm able to parse names like
> the one above?
> 
> Robson
> 
> ##################################################
> #!/usr/bin/perl -w
> 
> use strict;
> use Bio::SeqIO;
> 
> my $in = Bio::SeqIO->new(-file=>$ARGV[0],
>      -format=>"interpro");
> 
> my $i=1;
> while (1) {
>    my $seq;
>    eval {
>      $seq = $in->next_seq;
>    };
>    last if (!defined $seq);
>    if ($@) { print STDERR "Problem parsing sequence $i..."; next };
>      print STDERR $seq->id,"\n";
>      print "<=== ",$seq->id,"===>\n";
>     foreach my $f ($seq->get_all_SeqFeatures) {
>       print $f->gff_string,"\n";
>       foreach my $key ($f->annotation->get_all_annotation_keys) {
>         foreach my $value ($f->annotation->get_Annotations($key)) {
>           print $key,":",$value->as_text,"\n";
>         }
>       }
>     }
>     $i++;
> }
> 
> exit 0;
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


More information about the Bioperl-l mailing list