[Bioperl-l] Alignment Blast parser
Will Spooner
whs at sanger.ac.uk
Fri Jan 23 04:11:38 EST 2004
Hi Nathan,
The following code creates a multi-line wu-blast-style alignment string
from an HSP object. Is this what you are looking for? I would have
liked to have found a method like this on the HSP object itself, but I'm
not sure that this is where it strictly 'belongs'. I'd be happy to
contribute the code if someone knows the best place to put it!
All the best,
Will
sub alignment_string{
my $hsp = shift;
my $query = $hsp->query;
my $sbjct = $hsp->hit;
# Space to reserve for the numbering at the line start
my $seq_cols = 60;
my( $num_length ) = sort{ $b<=>$a } ( $query->start,
$query->end,
$sbjct->start,
$sbjct->end );
$num_length = length( $num_length );
# Templates for the lines
my $qtmpl = "Query: %${num_length}d %s %d\n";
my $xtmpl = ( " " x ( $num_length + 8 ) ) . "%s\n";
my $htmpl = "Sbjct: %${num_length}d %s %d\n";
# Divide the alignment strings onto lines
my $rows = ( ( length($hsp->query_string) - 1 ) / $seq_cols ) + 1;
my @qlines = unpack( "a$seq_cols" x $rows, $hsp->query_string );
my @xlines = unpack( "a$seq_cols" x $rows, $hsp->homology_string );
my @hlines = unpack( "a$seq_cols" x $rows, $hsp->hit_string );
# Things needed for counting; DNA|peptide
my $qmultiplier = ( ( $query->end - $query->start ) /
( $sbjct->end - $sbjct->start ) );
my $smultiplier;
if( $qmultiplier < 0.5 ){ $qmultiplier = 1; $smultiplier=3 }
elsif( $qmultiplier > 2 ){ $qmultiplier = 3; $smultiplier=1 }
else { $qmultiplier = 1; $smultiplier=1 }
# More counting things; strand
my $qstrand = $query->strand < 0 ? -1 : 1;
my $sstrand = $sbjct->strand < 0 ? -1 : 1;
my( $qstart, $qryend ) = $query->strand < 0 ?
( $query->end, $query->start) : ( $query->start, $query->end );
my( $hstart, $sbjend ) = $sbjct->strand < 0 ?
( $sbjct->end, $sbjct->start ) : ( $sbjct->start, $sbjct->end );
# Generate text for each line-triplet
my @lines;
for( my $i=0; $i<@qlines; $i++ ){
my $qend = $qstart + ( ( $seq_cols * $qmultiplier - 1 ) * $qstrand );
my $hend = $hstart + ( ( $seq_cols * $smultiplier - 1 ) * $sstrand );
if( $i == @qlines - 1 ){
$qend = $qryend;
$hend = $sbjend;
}
my $line = '';
$line .= sprintf( $qtmpl, $qstart, $qlines[$i], $qend );
$line .= sprintf( $xtmpl, $xlines[$i] );
$line .= sprintf( $htmpl, $hstart, $hlines[$i], $hend );
push @lines, $line;
$qstart = $qend + ( 1 * $qstrand );
$hstart = $hend + ( 1 * $sstrand );
}
return join( "\n", @lines );
}
On Thu, 22 Jan 2004, Brian Osborne wrote:
> Jason,
>
> I think he wants to print out the exact same alignment as you see in the
> BLAST report, which you can't do using AlignIO.
>
> Nathan, do you mean "exactly", with that "Query: 2" and "Subject: 2"
> business as well?
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Jason Stajich
> Sent: Thursday, January 22, 2004 2:14 PM
> To: Agrin, Nathan
> Cc: bioperl-l at portal.open-bio.org
> Subject: Re: [Bioperl-l] Alignment Blast parser
>
> You want to retain it meaning what?
> Bio::SearchIO(-format => 'blast')
> parses blast for you.
>
> -jason
> On Thu, 22 Jan 2004, Agrin, Nathan wrote:
>
> > I need a way to retain the EXACT alignment found in a blast file.
> > Ex-
> >
> >
> > Query: 1 MAGRGK-GKTSGKKAVSRSAKAGLQFPVGRIARYLKKGKYAERIGAGAPVYLAAVLEYLT
> > 59
> > M+GRGK G + KA SRS++AGLQFPVGR+ R L+KG YAER+GAGAPVYLAAVLEYLT
> > Sbjct: 1 MSGRGKQGGKARAKAKSRSSRAGLQFPVGRVHRLLRKGNYAERVGAGAPVYLAAVLEYLT
> > 60
> >
> >
> > Query: 60 AEVLELAGNAARDNKKNRIVPRHIQLAIRNDEELGKLLGEVTIASGGVLPNIHAVLLP
> > 117
> > AE+LELAGNAARDNKK RI+PRH+QLAIRNDEEL KLLG VTIA GGVLPNI AVLLP
> > Sbjct: 61 AEILELAGNAARDNKKTRIIPRHLQLAIRNDEELNKLLGRVTIAQGGVLPNIQAVLLP
> > 118
> >
> > I have tried everything to no avail. AlignIO almost works out, but
> > instead of doing a typical blast alignment, the best it can do is a
> > clustalw like alignment.
> >
> > Any help is much appreciated.
> >
> > -Nate
> >
> > Nathan Agrin
> > Research Associate
> > UMass Medical Center
> > 55 Lake Ave. N.
> > Worcester MA, 01655
> > (508)-856-6018
> > nathan.agrin at umassmed.edu
> >
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
---
Dr William Spooner whs at sanger.ac.uk
Ensembl Web Developer http://www.ensembl.org
More information about the Bioperl-l
mailing list