[Bioperl-l] Generating a consensus sequence from a Clustal alignment
Bill Stephens
grapeguy at gmail.com
Tue Oct 26 19:15:09 UTC 2010
I've got my script running.
In order to support the identification of degenerate primers I had to create
a new method that would alter the IUPAC consensus. We're interested in
15-30bp sections of the consensus with high, but not 100%, conservation. We
would like to have a couple of "slices" with 2-3 mismatches to allow for the
ordering of a variety of primers.
To be completed:
- I would *love* to hide the Clustal output that is being printed to the
screen
- Primer candidate identification (min_primerlength to max_primerlength)
based on our scoring matrix
- Primer melting temperature calculation
- 3' Clamping identification
- Self complementary primers
This has been a nice change from Java.
Bill
*Code*
assume: I have already read fasta input, performed Clustal align and
obtained the IUPAC consensus
sub getDegenerateConsensusIupac {
my ($aln, $maxGapPercent, $iupacConsensus, $numSequences) = @_;
# convert consensus into array
my @cons = split(//, $iupacConsensus);
# loop over the alignment "columns"
foreach my $position ( 1 .. $aln->length() ) {
my $consCode = $cons[$position-1];
my %slice = getSliceHash($aln, $position);
my $out = _applyGapPercentToConsensus ($maxGapPercent, $consCode,
$numSequences, %slice);
# Convert consensus value new value
if ($out ne $consCode) {
$cons[$position-1] = $out;
}
}
#rebuild consensus
my $degenerateConsensus = join '', @cons;
return $degenerateConsensus;
}
sub _applyGapPercentToConsensus {
my ($maxGapPercent, $consCode, $numSequences, %slice) = @_;
my $gapCount = $slice{'.'};
# no gaps!
if ($gapCount !~ /\d/) {
return $consCode;
}
my $pctGap = ($gapCount / $numSequences) * 100;
if ($pctGap > $maxGapPercent) {
if ($consCode =~ /[A-Z]/) {
return lc $consCode;
}
if ($consCode =~ /[a-z]/) {
return '.';
}
}
return $consCode;
}
sub getSliceHash{
my ($aln, $position) = @_;
my %sliceHash;
foreach my $seq ($aln->each_seq) {
my $res = $seq->subseq($position, $position);
$sliceHash{$res}++;
}
return %sliceHash;
}
On Tue, Oct 19, 2010 at 11:47 AM, Chris Fields <cjfields at illinois.edu>wrote:
> Bio::SimpleAlign is the class that contains the alignment data; it does not
> generate the alignment for you. You can use modules from BioPerl-Run that
> run ClustalW, MUSCLE, T-Coffee, etc to get a Bio::SimpleAlign instance, or
> parse the already-generated alignment output via Bio::AlignIO.
>
> From the Bio::Tools::Run::Alignment::ClustalW docs:
>
> =================================================================
>
> # Build a clustalw alignment factory
> @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
> $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);
>
> # Pass the factory a list of sequences to be aligned.
> $inputfilename = 't/data/cysprot.fa';
> $aln = $factory->align($inputfilename); # $aln is a SimpleAlign object.
>
> # ...or
> $seq_array_ref = \@seq_array;
>
> # where @seq_array is an array of Bio::Seq objects
> $aln = $factory->align($seq_array_ref);
>
> =================================================================
>
> $aln is a Bio::SimpleAlign derived from ClustalW output.
>
> From Bio::SimpleAlign (note the use of Bio::AlignIO):
>
> =================================================================
>
> # Use Bio::AlignIO to read in the alignment
> $str = Bio::AlignIO->new(-file => 't/data/testaln.pfam');
> $aln = $str->next_aln();
>
> # Describe
> print $aln->length;
> print $aln->num_residues;
> print $aln->is_flush;
> print $aln->num_sequences;
> print $aln->score;
> print $aln->percentage_identity;
> print $aln->consensus_string(50);
>
> =================================================================
>
> Note the consensus_string() method:
>
> Title : consensus_string
> Usage : $str = $ali->consensus_string($threshold_percent)
> Function : Makes a strict consensus
> Returns : Consensus string
> Argument : Optional treshold ranging from 0 to 100.
> The consensus residue has to appear at least threshold %
> of the sequences at a given location, otherwise a '?'
> character will be placed at that location.
> (Default value = 0%)
>
> chris
>
> On Oct 18, 2010, at 7:02 PM, Bill Stephens wrote:
>
> > So, I've got the SimpleAlign running. It looks like it's running the
> > alignment based upon the input sequence location only (first residue from
> > each sequence). This is not what I need.
> >
> > I'm back to to clustal, tcoffee or dalign.
> >
> > Bill
> >
> > On Mon, Oct 18, 2010 at 12:09 PM, Jun Yin <jun.yin at ucd.ie> wrote:
> >
> >> Hi, Bill,
> >>
> >> You may consider to use consensus_iupac or consensus_string methods in
> >> Bio::SimpleAlign to generate consensus sequence.
> >>
> >> Cheers,
> >> Jun Yin
> >> Ph.D. student in U.C.D.
> >>
> >> Bioinformatics Laboratory
> >> Conway Institute
> >> University College Dublin
> >>
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >> Sent: Monday, October 18, 2010 3:55 PM
> >> To: Bill Stephens
> >> Cc: bioperl-l at bioperl.org
> >> Subject: Re: [Bioperl-l] Generating a consensus sequence from a Clustal
> >> alignment
> >>
> >> Bill,
> >>
> >> Actually, the page you reached at Pasteur is not Pise, but Mobyle (their
> >> replacement for the older Pise tools). The Pise modules were in
> >> BioPerl-Run, but they were deprecated a few years ago and removed from
> the
> >> latest BioPerl-Run releases b/c the remote service is no longer active;
> >> there is no Perl-based replacement for Mobyle interaction.
> >>
> >> Have you thought about just using the functionality within the
> >> Bio::SimpleAlign class to generate the consensus? I'm pretty sure there
> >> are
> >> methods in place to do that.
> >>
> >> chris
> >>
> >> On Oct 18, 2010, at 8:58 AM, Bill Stephens wrote:
> >>
> >>> All,
> >>>
> >>> I'm in my first week with bioperl for a class project (although I've
> used
> >>> Perl for years). I've successfully run a clustal alignment of several
> DNA
> >>> sequences to produce the aln and dnd files. Now I would like to
> generate
> >> a
> >>> consensus sequence from the alignment. I see that Pise Cons does this
> >>> satisfactorily on my example data (
> >>> http://mobyle.pasteur.fr/cgi-bin/portal.py?form=consensus) . However,
> >> I'm
> >>> not finding Bio::Tools::Run::PiseApplication::cons in the 1.6.1
> >> distribution
> >>> that I installed.
> >>>
> >>> Is this another module that I need to install separately?
> >>>
> >>> "cpan[2]> m /Pise/
> >>> Module Bio::Tools::Run::AnalysisFactory::Pise
> >>> (BIRNEY/bioperl-run-1.4.tar.gz)
> >>> Module Bio::Tools::Run::PiseApplication
> >> (BIRNEY/bioperl-run-1.4.tar.gz)"
> >>>
> >>> Bill S.
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> __________ Information from ESET Smart Security, version of virus
> signature
> >> database 5377 (20100818) __________
> >>
> >> The message was checked by ESET Smart Security.
> >>
> >> http://www.eset.com
> >>
> >>
> >>
> >>
> >> __________ Information from ESET Smart Security, version of virus
> signature
> >> database 5377 (20100818) __________
> >>
> >> The message was checked by ESET Smart Security.
> >>
> >> http://www.eset.com
> >>
> >>
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list