[Bioperl-l] bad entries in interpro
Allen Day
allenday at ucla.edu
Thu Nov 25 05:44:09 EST 2004
i'm not ignoring you -- i'll get back to you on this soon.
-allen
On Tue, 23 Nov 2004, Robson Francisco de Souza {S} wrote:
> Hi everyone,
>
> A few days ago, Mikko Arvas sent an e-mail to this list asking how to
> ignore bad entries in the matches.xml file from the InterPro database.
> Hilmar Lapp answered asking him to locate the position in the file that
> raises the error message
>
> >> not well-formed (invalid token) at line 2, column 53, byte 131 at
> >> /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm
> >> line 187
>
> Well, I saw no answers on the list, therefore I'm sending the problemtic
> entry below:
>
> <protein id="O00408" name="CN2A_HUMAN" length="941"
> crc64="9797609B487FD64E">
> <interpro id="IPR002073" name="3'5'-cyclic nucleotide
> phosphodiesterase" type="Domain" parent_id="IPR003607">
>
> The problem seems to be the "'" annotation at the second line.
>
> I also tested if an eval clause could be used to bypass such entries
> without crashing a script. The example script below worked fine and
> reported a problem with the entry above without crashing.
>
> Would it be too dificult to make interpro.pm able to parse names like
> the one above?
>
> Robson
>
> ##################################################
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SeqIO;
>
> my $in = Bio::SeqIO->new(-file=>$ARGV[0],
> -format=>"interpro");
>
> my $i=1;
> while (1) {
> my $seq;
> eval {
> $seq = $in->next_seq;
> };
> last if (!defined $seq);
> if ($@) { print STDERR "Problem parsing sequence $i..."; next };
> print STDERR $seq->id,"\n";
> print "<=== ",$seq->id,"===>\n";
> foreach my $f ($seq->get_all_SeqFeatures) {
> print $f->gff_string,"\n";
> foreach my $key ($f->annotation->get_all_annotation_keys) {
> foreach my $value ($f->annotation->get_Annotations($key)) {
> print $key,":",$value->as_text,"\n";
> }
> }
> }
> $i++;
> }
>
> exit 0;
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list