[Bioperl-l] Some more troubles with HTML module?

Steve Chervitz sac@neomorphic.com
Mon, 30 Oct 2000 01:30:43 -0800


Ewan Birney wrote:

> On Wed, 25 Oct 2000, Carl Virtanen wrote:
>
> > Hi folks,
> >
> > I'm a little new at checking out some of this stuff, so please bear with me. I'm using bioperl 6.2.
> > The problem i'm having is that the output from the Blast->to_html routine is not picking up all of the correct references and 'htmlifying' them (see my example near the bottom).  I'm just using the standard kinda usage:
> > use Bio::Tools::Blast qw(:obj);
> > $Blast->to_html(file=>$ARGV[0]);
> >
> >  I've narrowed the problem down to the HTML.pm module.  Now, call me a
> > bonehead (if you wish, but that wouldn't be really nice now would it?)
> > but the regexps in there are some real bad ass ones (if you'll excuse
> > my colourful explanation)! So tracking down where the problem is is
> > not so easy for me.  Actually, if somebody would explain to me at
> > least one of the regexps,for example:
> >
> > s@^ ?(gb|emb|dbj)\|($Word)(\|$Word)?($Descrip)($Int +)($Signif)(.*)$@$1:<a hre
> > f="$DbUrl{'gb_n'}$2">$2$3</a>$4$5<A href="\#$2_A">$6</a>$7<a name="$2_H"></a>@o;
>
> Apologies. That looks like a _beast_.
>

C'mon guys, it's not that bad! (At least there aren't any nested parens in this example, as there are in some of the others ;). All of the HTML formatting is achived by a set of substitution regexps that attempt to identify
the database, sequence id, etc. and then substitute in HTML links to either external resources or to internal positions in the document.

So, for example, the <A href="\#$2_A">$6</a> bit creates an internal link from the E-value in the description line to the alignment section further down in the report. The <a name="..."> bit creates an internal target so you
can link back to the description line from the alignment section, which gets processed by a different substitution regexp. It gets easier to understand these after you stare at them for a minute or two.

This is a good example of programming by regexp that is only possible in perl (well, easier to do in perl than in other languages). Every line in the Blast report is analyzed by the same set of regexps.  Matching lines are
processed appropriately by the associated substitution. The little 'o' at the end compiles the regexps once for efficiency.

I just updated the Blast::HTML module to deal with lines like Carl reported. You can obtain this updated version at ftp://bio.perl.org/pub/sac/blast/HTML.pm. Just replace the old version of Bio/Tools/Blast/HTML.pm with this
file (unless you have other customizations that you want to save, in which case do a diff).

>
> This one is for stevec if he is tuning in. I have to admit, I tend to
> generate html files myself by going through the loops
>
> foreach $hit ( ... ){
>     foreach $hsp ( ... ) {
>
>     }
> }
>
> etc. But that is more coding for you....
>

One advantage of using the built-in HTML formatting functionality of the Blast module is that you don't have to parse the whole Blast report into memory before generating HTML. The HTML can be generated line by line from
STDIN. This can come in handy for large reports that you want to examine via web browser.

BTW, you don't actually use the Blast::HTML module directly. It is used by the to_html() function of the Blast module. For an example of usage, see examples/blast/html.pl in the Bioperl distribution.

Steve
--
Steve Chervitz
sac@neomorphic.com

>
> >
> >
> > then i would be very grateful and would even try to track down the problem myself and possibly contribute a little to all of this. I'm familiar with basic regexps/substitution and so on, but yikes!
> >
> > Anyways, here's the output, and you can see that it's missing a bunch of gi's. The search was just a routine peek at some proteins in the nr database:
> >
> > Sequences producing significant alignments:                        (bits)  Value
> >
> > emb|CAB55683.1| (AL035427) dJ769N13.1 (KIAA0443 protein.) [Homo ...   214  2e-54
> > ref|NP_055525.1| KIAA0443 gene product >gi|7512985|pir||T00068 h...   214  2e-54
> > dbj|BAB14367.1| (AK023031) unnamed protein product [Homo sapiens]     181  2e-44
> > gb:AAF64273.1|AF208859_1 (AF208859) BM-017 [Homo sapiens] >gi|82...   123  6e-27
> >
> >
> >
> > Thanks!
> >
> > Carl Virtanen
> >
> >
> >
>
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>.
> -----------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l