[Bioperl-l] Alignment Blast parser
Jason Stajich
jason at cgt.duhs.duke.edu
Fri Jan 23 08:20:35 EST 2004
Currently the code to do this sort of thing is embedded in
SearchIO::Writer::TextResultWriter but generates a whole report.
Someone can move the code out that part of the code to a separate function
so it can be used for situations like what Nathan is asking for. Not sure
where it should go at this point other than in TextResultWriter - we don't
really have a class of 'HSP consumers' other than in there.
Will's code is smarter as it figures out the multiplier from the start/end
while we code it up based on the alignment alg (TBLASTN, BLASTX, etc).
-jason
On Fri, 23 Jan 2004, Will Spooner wrote:
> Hi Nathan,
>
> The following code creates a multi-line wu-blast-style alignment string
> from an HSP object. Is this what you are looking for? I would have
> liked to have found a method like this on the HSP object itself, but I'm
> not sure that this is where it strictly 'belongs'. I'd be happy to
> contribute the code if someone knows the best place to put it!
>
> All the best,
>
> Will
>
>
> sub alignment_string{
> my $hsp = shift;
> my $query = $hsp->query;
> my $sbjct = $hsp->hit;
>
>
> # Space to reserve for the numbering at the line start
> my $seq_cols = 60;
> my( $num_length ) = sort{ $b<=>$a } ( $query->start,
> $query->end,
> $sbjct->start,
> $sbjct->end );
> $num_length = length( $num_length );
>
> # Templates for the lines
> my $qtmpl = "Query: %${num_length}d %s %d\n";
> my $xtmpl = ( " " x ( $num_length + 8 ) ) . "%s\n";
> my $htmpl = "Sbjct: %${num_length}d %s %d\n";
>
> # Divide the alignment strings onto lines
> my $rows = ( ( length($hsp->query_string) - 1 ) / $seq_cols ) + 1;
> my @qlines = unpack( "a$seq_cols" x $rows, $hsp->query_string );
> my @xlines = unpack( "a$seq_cols" x $rows, $hsp->homology_string );
> my @hlines = unpack( "a$seq_cols" x $rows, $hsp->hit_string );
>
> # Things needed for counting; DNA|peptide
> my $qmultiplier = ( ( $query->end - $query->start ) /
> ( $sbjct->end - $sbjct->start ) );
> my $smultiplier;
> if( $qmultiplier < 0.5 ){ $qmultiplier = 1; $smultiplier=3 }
> elsif( $qmultiplier > 2 ){ $qmultiplier = 3; $smultiplier=1 }
> else { $qmultiplier = 1; $smultiplier=1 }
>
> # More counting things; strand
> my $qstrand = $query->strand < 0 ? -1 : 1;
> my $sstrand = $sbjct->strand < 0 ? -1 : 1;
> my( $qstart, $qryend ) = $query->strand < 0 ?
> ( $query->end, $query->start) : ( $query->start, $query->end );
> my( $hstart, $sbjend ) = $sbjct->strand < 0 ?
> ( $sbjct->end, $sbjct->start ) : ( $sbjct->start, $sbjct->end );
>
> # Generate text for each line-triplet
> my @lines;
> for( my $i=0; $i<@qlines; $i++ ){
> my $qend = $qstart + ( ( $seq_cols * $qmultiplier - 1 ) * $qstrand );
> my $hend = $hstart + ( ( $seq_cols * $smultiplier - 1 ) * $sstrand );
> if( $i == @qlines - 1 ){
> $qend = $qryend;
> $hend = $sbjend;
> }
> my $line = '';
> $line .= sprintf( $qtmpl, $qstart, $qlines[$i], $qend );
> $line .= sprintf( $xtmpl, $xlines[$i] );
> $line .= sprintf( $htmpl, $hstart, $hlines[$i], $hend );
> push @lines, $line;
> $qstart = $qend + ( 1 * $qstrand );
> $hstart = $hend + ( 1 * $sstrand );
> }
>
> return join( "\n", @lines );
> }
>
>
> On Thu, 22 Jan 2004, Brian Osborne wrote:
>
> > Jason,
> >
> > I think he wants to print out the exact same alignment as you see in the
> > BLAST report, which you can't do using AlignIO.
> >
> > Nathan, do you mean "exactly", with that "Query: 2" and "Subject: 2"
> > business as well?
> >
> > Brian O.
> >
> > -----Original Message-----
> > From: bioperl-l-bounces at portal.open-bio.org
> > [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Jason Stajich
> > Sent: Thursday, January 22, 2004 2:14 PM
> > To: Agrin, Nathan
> > Cc: bioperl-l at portal.open-bio.org
> > Subject: Re: [Bioperl-l] Alignment Blast parser
> >
> > You want to retain it meaning what?
> > Bio::SearchIO(-format => 'blast')
> > parses blast for you.
> >
> > -jason
> > On Thu, 22 Jan 2004, Agrin, Nathan wrote:
> >
> > > I need a way to retain the EXACT alignment found in a blast file.
> > > Ex-
> > >
> > >
> > > Query: 1 MAGRGK-GKTSGKKAVSRSAKAGLQFPVGRIARYLKKGKYAERIGAGAPVYLAAVLEYLT
> > > 59
> > > M+GRGK G + KA SRS++AGLQFPVGR+ R L+KG YAER+GAGAPVYLAAVLEYLT
> > > Sbjct: 1 MSGRGKQGGKARAKAKSRSSRAGLQFPVGRVHRLLRKGNYAERVGAGAPVYLAAVLEYLT
> > > 60
> > >
> > >
> > > Query: 60 AEVLELAGNAARDNKKNRIVPRHIQLAIRNDEELGKLLGEVTIASGGVLPNIHAVLLP
> > > 117
> > > AE+LELAGNAARDNKK RI+PRH+QLAIRNDEEL KLLG VTIA GGVLPNI AVLLP
> > > Sbjct: 61 AEILELAGNAARDNKKTRIIPRHLQLAIRNDEELNKLLGRVTIAQGGVLPNIQAVLLP
> > > 118
> > >
> > > I have tried everything to no avail. AlignIO almost works out, but
> > > instead of doing a typical blast alignment, the best it can do is a
> > > clustalw like alignment.
> > >
> > > Any help is much appreciated.
> > >
> > > -Nate
> > >
> > > Nathan Agrin
> > > Research Associate
> > > UMass Medical Center
> > > 55 Lake Ave. N.
> > > Worcester MA, 01655
> > > (508)-856-6018
> > > nathan.agrin at umassmed.edu
> > >
> > >
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> ---
> Dr William Spooner whs at sanger.ac.uk
> Ensembl Web Developer http://www.ensembl.org
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list