[Bioperl-l] Re: parsing BLAST html

Wed Aug 13 14:31:12 EDT 2003

Brian, you may want to add that something like this should also work:

use Bio::SearchIO;
use Bio::SearchIO::blast;
use HTML::Strip;

my $hs = new HTML::Strip;

# replace the blast parser's _readline method with one that
# auto-strips HTML:

sub Bio::SearchIO::blast::_readline {
  my ($self, @args) = @_;
  return $hs->parse($self->SUPER::_readline(@args));
}

$io = new Bio::SearchIO -file => "etc", -format => "blast";
# etc ...

-Aaron

On Wed, 13 Aug 2003, Brian Osborne wrote:

> Sofia,
>
> Just making sure here. The output from StripHTML can be parsed by SearchIO?
> This probably belongs in the FAQ.
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Sofia
> Sent: Tuesday, August 12, 2003 10:41 AM
> To: Bioperl Mailing List
> Subject: [Bioperl-l] Re: parsing BLAST html
>
> I use PerlIO::via::StripHTML and it works quite successfully
> -
> Sofia
>
> Hi Wes,
> Before I parse my html blast I use PerlIO::via::StripHTML.  It removes all
> html and I save the new file as the orginalFileName.out.  I like the html
> blast output because I save them later for another use. But if I didnt need
> them I would just use text output.
>
> use strict;
> use Bio::SearchIO;
> use PerlIO::via::StripHTML;
>
> my @dir_html_files = </usr/local/www/Blast/NcbiBlast/*.htm>;
> foreach my $file (@dir_html_files){
>  my $outfile  = $file."\.out";
>  open OUTFILE, ">$outfile";
>  open INFILE, '<:via(StripHTML)', $file
>     or die "Can't open $outfile: $!\n";
>  while (<INFILE>){
>   print OUTFILE $_;
>  }
> }
>
> -Sofia
> ----- Original Message -----
> From: "Jason Stajich" <jason at cgt.duhs.duke.edu>
> To: "Wes Barris" <wes.barris at csiro.au>
> Cc: "Bioperl Mailing List" <bioperl-l at bioperl.org>
> Sent: Tuesday, August 12, 2003 6:23 AM
> Subject: Re: [Bioperl-l] Parsing html blast output?
>
>
> > No, it is not currently possible to parse BLAST HTML output.
> >
> > On Tue, 12 Aug 2003, Wes Barris wrote:
> >
> > > Hi,
> > >
> > > I know it is possible to use the SearchIO functions to parse either
> > > text blast output or xml blast output.  However, I would like to know
> > > if it is possible to parse html blast output?  For example, if I wanted
> > > to parse the output of this command:
> > >
> > > blastcl3 -d nr -p blastn -T -i fasta.txt -o blast.html
> > >
> > > When I try parsing the above "blast.html" file using example number 4
> > > from this file:
> > >
> > > http://bioperl.org/HOWTOs/html/Graphics-HOWTO.html
> > >
> > > I get errors.
> > >
> > > What I ended up doing is writing a perl "de-htmlizer" that I use to
> > > convert an html blast output file into a text-only blast output file.
> > > Then I run the result through a bioperl blast parsing script.  Is
> > > there a more elegant way to do this?
> > >
> > >
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
 Aaron J Mackey
 Pearson Laboratory
 University of Virginia
 (434) 924-2821
 amackey at virginia.edu