[Bioperl-l] Generating a consensus sequence from a Clustal alignment

Tue Oct 19 15:47:16 UTC 2010

Bio::SimpleAlign is the class that contains the alignment data; it does not generate the alignment for you.  You can use modules from BioPerl-Run that run ClustalW, MUSCLE, T-Coffee, etc to get a Bio::SimpleAlign instance, or parse the already-generated alignment output via Bio::AlignIO.

>From the Bio::Tools::Run::Alignment::ClustalW docs:

=================================================================

  #  Build a clustalw alignment factory
  @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
  $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);

  #  Pass the factory a list of sequences to be aligned.	
  $inputfilename = 't/data/cysprot.fa';
  $aln = $factory->align($inputfilename); # $aln is a SimpleAlign object.

  # ...or
  $seq_array_ref = \@seq_array;

  # where @seq_array is an array of Bio::Seq objects
  $aln = $factory->align($seq_array_ref);

=================================================================

$aln is a Bio::SimpleAlign derived from ClustalW output.  

>From Bio::SimpleAlign (note the use of Bio::AlignIO):

=================================================================

  # Use Bio::AlignIO to read in the alignment
  $str = Bio::AlignIO->new(-file => 't/data/testaln.pfam');
  $aln = $str->next_aln();

  # Describe
  print $aln->length;
  print $aln->num_residues;
  print $aln->is_flush;
  print $aln->num_sequences;
  print $aln->score;
  print $aln->percentage_identity;
  print $aln->consensus_string(50);

=================================================================

Note the consensus_string() method:

 Title     : consensus_string
 Usage     : $str = $ali->consensus_string($threshold_percent)
 Function  : Makes a strict consensus
 Returns   : Consensus string
 Argument  : Optional treshold ranging from 0 to 100.
             The consensus residue has to appear at least threshold %
             of the sequences at a given location, otherwise a '?'
             character will be placed at that location.
             (Default value = 0%)

chris

On Oct 18, 2010, at 7:02 PM, Bill Stephens wrote:

> So, I've got the SimpleAlign running. It looks like it's running the
> alignment based upon the input sequence location only (first residue from
> each sequence).  This is not what I need.
> 
> I'm back to to clustal, tcoffee or dalign.
> 
> Bill
> 
> On Mon, Oct 18, 2010 at 12:09 PM, Jun Yin <jun.yin at ucd.ie> wrote:
> 
>> Hi, Bill,
>> 
>> You may consider to use consensus_iupac or consensus_string methods in
>> Bio::SimpleAlign to generate consensus sequence.
>> 
>> Cheers,
>> Jun Yin
>> Ph.D. student in U.C.D.
>> 
>> Bioinformatics Laboratory
>> Conway Institute
>> University College Dublin
>> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Monday, October 18, 2010 3:55 PM
>> To: Bill Stephens
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] Generating a consensus sequence from a Clustal
>> alignment
>> 
>> Bill,
>> 
>> Actually, the page you reached at Pasteur is not Pise, but Mobyle (their
>> replacement for the older Pise tools).  The Pise modules were in
>> BioPerl-Run, but they were deprecated a few years ago and removed from the
>> latest BioPerl-Run releases b/c the remote service is no longer active;
>> there is no Perl-based replacement for Mobyle interaction.
>> 
>> Have you thought about just using the functionality within the
>> Bio::SimpleAlign class to generate the consensus?  I'm pretty sure there
>> are
>> methods in place to do that.
>> 
>> chris
>> 
>> On Oct 18, 2010, at 8:58 AM, Bill Stephens wrote:
>> 
>>> All,
>>> 
>>> I'm in my first week with bioperl for a class project (although I've used
>>> Perl for years). I've successfully run a clustal alignment of several DNA
>>> sequences to produce the aln and dnd files. Now I would like to generate
>> a
>>> consensus sequence from the alignment.  I see that Pise Cons does this
>>> satisfactorily on my example data (
>>> http://mobyle.pasteur.fr/cgi-bin/portal.py?form=consensus) . However,
>> I'm
>>> not finding Bio::Tools::Run::PiseApplication::cons in the 1.6.1
>> distribution
>>> that I installed.
>>> 
>>> Is this another module that I need to install separately?
>>> 
>>> "cpan[2]> m /Pise/
>>> Module    Bio::Tools::Run::AnalysisFactory::Pise
>>> (BIRNEY/bioperl-run-1.4.tar.gz)
>>> Module    Bio::Tools::Run::PiseApplication
>> (BIRNEY/bioperl-run-1.4.tar.gz)"
>>> 
>>> Bill S.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> __________ Information from ESET Smart Security, version of virus signature
>> database 5377 (20100818) __________
>> 
>> The message was checked by ESET Smart Security.
>> 
>> http://www.eset.com
>> 
>> 
>> 
>> 
>> __________ Information from ESET Smart Security, version of virus signature
>> database 5377 (20100818) __________
>> 
>> The message was checked by ESET Smart Security.
>> 
>> http://www.eset.com
>> 
>> 
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l