[Bioperl-l] Alignment Blast parser

Jason Stajich jason at cgt.duhs.duke.edu
Fri Jan 23 08:20:35 EST 2004


Currently the code to do this sort of thing is embedded in
SearchIO::Writer::TextResultWriter but generates a whole report.

Someone can move the code out that part of the code to a separate function
so it can be used for situations like what Nathan is asking for.  Not sure
where it should go at this point other than in TextResultWriter - we don't
really have a class of 'HSP consumers' other than in there.

Will's code is smarter as it figures out the multiplier from the start/end
while we code it up based on the alignment alg (TBLASTN, BLASTX, etc).


-jason
On Fri, 23 Jan 2004, Will Spooner wrote:

> Hi Nathan,
>
> The following code creates a multi-line wu-blast-style alignment string
> from an HSP object. Is this what you are looking for? I would have
> liked to have found a method like this on the HSP object itself, but I'm
> not sure that this is where it strictly 'belongs'. I'd be happy to
> contribute the code if someone knows the best place to put it!
>
> All the best,
>
> Will
>
>
> sub alignment_string{
>   my $hsp = shift;
>   my $query = $hsp->query;
>   my $sbjct = $hsp->hit;
>
>
>   # Space to reserve for the numbering at the line start
>   my $seq_cols = 60;
>   my( $num_length ) = sort{ $b<=>$a } ( $query->start,
>                                         $query->end,
>                                         $sbjct->start,
>                                         $sbjct->end );
>   $num_length = length( $num_length );
>
>   # Templates for the lines
>   my $qtmpl = "Query: %${num_length}d %s %d\n";
>   my $xtmpl = ( " " x ( $num_length + 8 ) ) .  "%s\n";
>   my $htmpl = "Sbjct: %${num_length}d %s %d\n";
>
>   # Divide the alignment strings onto lines
>   my $rows = ( ( length($hsp->query_string) - 1 ) / $seq_cols ) + 1;
>   my @qlines = unpack( "a$seq_cols" x $rows, $hsp->query_string );
>   my @xlines = unpack( "a$seq_cols" x $rows, $hsp->homology_string );
>   my @hlines = unpack( "a$seq_cols" x $rows, $hsp->hit_string );
>
>   # Things needed for counting; DNA|peptide
>   my $qmultiplier = ( ( $query->end - $query->start ) /
>                       ( $sbjct->end - $sbjct->start ) );
>   my $smultiplier;
>   if( $qmultiplier < 0.5  ){ $qmultiplier = 1; $smultiplier=3 }
>   elsif( $qmultiplier > 2 ){ $qmultiplier = 3; $smultiplier=1 }
>   else                     { $qmultiplier = 1; $smultiplier=1 }
>
>   # More counting things; strand
>   my $qstrand = $query->strand < 0 ? -1 : 1;
>   my $sstrand = $sbjct->strand < 0 ? -1 : 1;
>   my( $qstart, $qryend ) = $query->strand < 0 ?
>      ( $query->end, $query->start) : ( $query->start, $query->end );
>   my( $hstart, $sbjend ) = $sbjct->strand < 0 ?
>     ( $sbjct->end, $sbjct->start ) : ( $sbjct->start, $sbjct->end );
>
>   # Generate text for each line-triplet
>   my @lines;
>   for( my $i=0; $i<@qlines; $i++ ){
>     my $qend = $qstart + ( ( $seq_cols * $qmultiplier - 1 ) * $qstrand );
>     my $hend = $hstart + ( ( $seq_cols * $smultiplier - 1 ) * $sstrand );
>     if( $i == @qlines - 1 ){
>       $qend = $qryend;
>       $hend = $sbjend;
>     }
>     my $line = '';
>     $line .= sprintf( $qtmpl, $qstart, $qlines[$i], $qend );
>     $line .= sprintf( $xtmpl, $xlines[$i] );
>     $line .= sprintf( $htmpl, $hstart, $hlines[$i], $hend );
>     push @lines, $line;
>     $qstart = $qend + ( 1 * $qstrand );
>     $hstart = $hend + ( 1 * $sstrand );
>   }
>
>   return join( "\n", @lines );
> }
>
>
> On Thu, 22 Jan 2004, Brian Osborne wrote:
>
> > Jason,
> >
> > I think he wants to print out the exact same alignment as you see in the
> > BLAST report, which you can't do using AlignIO.
> >
> > Nathan, do you mean "exactly", with that "Query: 2" and "Subject: 2"
> > business as well?
> >
> > Brian O.
> >
> > -----Original Message-----
> > From: bioperl-l-bounces at portal.open-bio.org
> > [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Jason Stajich
> > Sent: Thursday, January 22, 2004 2:14 PM
> > To: Agrin, Nathan
> > Cc: bioperl-l at portal.open-bio.org
> > Subject: Re: [Bioperl-l] Alignment Blast parser
> >
> > You want to retain it meaning what?
> >  Bio::SearchIO(-format => 'blast')
> > parses blast for you.
> >
> > -jason
> > On Thu, 22 Jan 2004, Agrin, Nathan wrote:
> >
> > > I need a way to retain the EXACT alignment found in a blast file.
> > > Ex-
> > >
> > >
> > > Query: 1   MAGRGK-GKTSGKKAVSRSAKAGLQFPVGRIARYLKKGKYAERIGAGAPVYLAAVLEYLT
> > > 59
> > >            M+GRGK G  +  KA SRS++AGLQFPVGR+ R L+KG YAER+GAGAPVYLAAVLEYLT
> > > Sbjct: 1   MSGRGKQGGKARAKAKSRSSRAGLQFPVGRVHRLLRKGNYAERVGAGAPVYLAAVLEYLT
> > > 60
> > >
> > >
> > > Query: 60  AEVLELAGNAARDNKKNRIVPRHIQLAIRNDEELGKLLGEVTIASGGVLPNIHAVLLP
> > > 117
> > >            AE+LELAGNAARDNKK RI+PRH+QLAIRNDEEL KLLG VTIA GGVLPNI AVLLP
> > > Sbjct: 61  AEILELAGNAARDNKKTRIIPRHLQLAIRNDEELNKLLGRVTIAQGGVLPNIQAVLLP
> > > 118
> > >
> > > I have tried everything to no avail.  AlignIO almost works out, but
> > > instead of doing a typical blast alignment, the best it can do is a
> > > clustalw like alignment.
> > >
> > > Any help is much appreciated.
> > >
> > > -Nate
> > >
> > > Nathan Agrin
> > > Research Associate
> > > UMass Medical Center
> > > 55 Lake Ave. N.
> > > Worcester MA, 01655
> > > (508)-856-6018
> > > nathan.agrin at umassmed.edu
> > >
> > >
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> ---
> Dr William Spooner                          whs at sanger.ac.uk
> Ensembl Web Developer                 http://www.ensembl.org
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list