[Bioperl-l] parsing an html blast result file
Jason Stajich
jason at cgt.duhs.duke.edu
Wed Jul 23 10:14:32 EDT 2003
Later versions of NCBI BLAST XML aren't formed correctly - or XML::Parser
is tripping up on something it should ignore.
I have not had time to really figure out how to fix it, but basically if
you make your XML file look like
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN"
"NCBI_BlastOutput.dtd"><BlastOutput>
instead of
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN"
"NCBI_BlastOutput.dtd">
<BlastOutput>
It should work. Wanted to put something in the preprocessing in SearchIO
to handle it, but don't have time.
I'm no XML love/expert so I haven't really tried to dig deep into why this
is tripping up XML::Parser.
-jason
On Wed, 23 Jul 2003, Wes Barris wrote:
> Hi,
>
> I have installed bioperl etal. on a Sun (Solaris8). I would now like
> to try parsing an html blast results file. I saved example 4 from this
> page into a file:
>
> http://www.bioperl.org/HOWTOs/html/Graphics-HOWTO.html
>
> The only thing I changed in the file is the format of the input file
> from this:
>
> -format => 'blast') or die "parse failed";
>
> to this:
>
> -format => 'blastxml') or die "parse failed";
>
> I am assuming that the format of an html blast result file is "blastxml",
> but I could be wrong. I could not find a list of valid formats that can
> be used with the Bio::SearchIO->new constructor.
>
> When I run the example 4 script, I get this error:
>
> wes at sequence> blasttoimg.pl junk.html >junk.png
>
> -------------------- WARNING ---------------------
> MSG: error in parsing a report:
>
> not well-formed (invalid token) at line 9, column 34, byte 238 at
> /usr/local/lib/perl5/site_perl/5.6.1/sun4-solaris/XML/Parser.pm line 185
>
> ---------------------------------------------------
> no result at /home/wes/proj/blast/blasttoimg.pl line 15, <GEN1> line 669.
>
> Could anyone suggest what I might try to make this work?
>
> #!/usr/local/bin/perl
>
> # This is code example 4 in the Graphics-HOWTO
> use strict;
> #use lib "$ENV{HOME}/projects/bioperl-live";
> use Bio::Graphics;
> use Bio::SearchIO;
>
> my $file = shift or die "Usage: render4.pl <blast file>\n";
>
> my $searchio = Bio::SearchIO->new(-file => $file,
> -format => 'blastxml') or die "parse failed";
>
>
> my $result = $searchio->next_result() or die "no result";
>
> my $panel = Bio::Graphics::Panel->new(-length => $result->query_length,
> -width => 800,
> -pad_left => 10,
> -pad_right => 10,
> );
>
> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>$result->query_length,
> -seq_id=>$result->query_name);
>
> $panel->add_track($full_length,
> -glyph => 'arrow',
> -tick => 2,
> -fgcolor => 'black',
> -double => 1,
> -label => 1,
> );
>
> my $track = $panel->add_track(-glyph => 'graded_segments',
> -label => 1,
> -connector => 'dashed',
> -bgcolor => 'blue',
> -font2color => 'red',
> -sort_order => 'high_score',
> -description => sub {
> my $feature = shift;
> return unless $feature->has_tag('description');
> my ($description) = $feature->each_tag_value('description');
> my $score = $feature->score;
> "$description, score=$score";
> });
>
> while( my $hit = $result->next_hit ) {
> next unless $hit->significance < 1E-20;
> my $feature = Bio::SeqFeature::Generic->new(-score => $hit->raw_score,
> -seq_id => $hit->name,
> -tag => {
> description => $hit->description
> },
> );
> while( my $hsp = $hit->next_hsp ) {
> $feature->add_sub_SeqFeature($hsp,'EXPAND');
> }
>
> $track->add_feature($feature);
> }
>
> print $panel->png;
>
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list