[Bioperl-l] Generating a consensus sequence from a Clustal alignment
Chris Fields
cjfields at illinois.edu
Tue Oct 19 15:47:16 UTC 2010
Bio::SimpleAlign is the class that contains the alignment data; it does not generate the alignment for you. You can use modules from BioPerl-Run that run ClustalW, MUSCLE, T-Coffee, etc to get a Bio::SimpleAlign instance, or parse the already-generated alignment output via Bio::AlignIO.
>From the Bio::Tools::Run::Alignment::ClustalW docs:
=================================================================
# Build a clustalw alignment factory
@params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
$factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);
# Pass the factory a list of sequences to be aligned.
$inputfilename = 't/data/cysprot.fa';
$aln = $factory->align($inputfilename); # $aln is a SimpleAlign object.
# ...or
$seq_array_ref = \@seq_array;
# where @seq_array is an array of Bio::Seq objects
$aln = $factory->align($seq_array_ref);
=================================================================
$aln is a Bio::SimpleAlign derived from ClustalW output.
>From Bio::SimpleAlign (note the use of Bio::AlignIO):
=================================================================
# Use Bio::AlignIO to read in the alignment
$str = Bio::AlignIO->new(-file => 't/data/testaln.pfam');
$aln = $str->next_aln();
# Describe
print $aln->length;
print $aln->num_residues;
print $aln->is_flush;
print $aln->num_sequences;
print $aln->score;
print $aln->percentage_identity;
print $aln->consensus_string(50);
=================================================================
Note the consensus_string() method:
Title : consensus_string
Usage : $str = $ali->consensus_string($threshold_percent)
Function : Makes a strict consensus
Returns : Consensus string
Argument : Optional treshold ranging from 0 to 100.
The consensus residue has to appear at least threshold %
of the sequences at a given location, otherwise a '?'
character will be placed at that location.
(Default value = 0%)
chris
On Oct 18, 2010, at 7:02 PM, Bill Stephens wrote:
> So, I've got the SimpleAlign running. It looks like it's running the
> alignment based upon the input sequence location only (first residue from
> each sequence). This is not what I need.
>
> I'm back to to clustal, tcoffee or dalign.
>
> Bill
>
> On Mon, Oct 18, 2010 at 12:09 PM, Jun Yin <jun.yin at ucd.ie> wrote:
>
>> Hi, Bill,
>>
>> You may consider to use consensus_iupac or consensus_string methods in
>> Bio::SimpleAlign to generate consensus sequence.
>>
>> Cheers,
>> Jun Yin
>> Ph.D. student in U.C.D.
>>
>> Bioinformatics Laboratory
>> Conway Institute
>> University College Dublin
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Monday, October 18, 2010 3:55 PM
>> To: Bill Stephens
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] Generating a consensus sequence from a Clustal
>> alignment
>>
>> Bill,
>>
>> Actually, the page you reached at Pasteur is not Pise, but Mobyle (their
>> replacement for the older Pise tools). The Pise modules were in
>> BioPerl-Run, but they were deprecated a few years ago and removed from the
>> latest BioPerl-Run releases b/c the remote service is no longer active;
>> there is no Perl-based replacement for Mobyle interaction.
>>
>> Have you thought about just using the functionality within the
>> Bio::SimpleAlign class to generate the consensus? I'm pretty sure there
>> are
>> methods in place to do that.
>>
>> chris
>>
>> On Oct 18, 2010, at 8:58 AM, Bill Stephens wrote:
>>
>>> All,
>>>
>>> I'm in my first week with bioperl for a class project (although I've used
>>> Perl for years). I've successfully run a clustal alignment of several DNA
>>> sequences to produce the aln and dnd files. Now I would like to generate
>> a
>>> consensus sequence from the alignment. I see that Pise Cons does this
>>> satisfactorily on my example data (
>>> http://mobyle.pasteur.fr/cgi-bin/portal.py?form=consensus) . However,
>> I'm
>>> not finding Bio::Tools::Run::PiseApplication::cons in the 1.6.1
>> distribution
>>> that I installed.
>>>
>>> Is this another module that I need to install separately?
>>>
>>> "cpan[2]> m /Pise/
>>> Module Bio::Tools::Run::AnalysisFactory::Pise
>>> (BIRNEY/bioperl-run-1.4.tar.gz)
>>> Module Bio::Tools::Run::PiseApplication
>> (BIRNEY/bioperl-run-1.4.tar.gz)"
>>>
>>> Bill S.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> __________ Information from ESET Smart Security, version of virus signature
>> database 5377 (20100818) __________
>>
>> The message was checked by ESET Smart Security.
>>
>> http://www.eset.com
>>
>>
>>
>>
>> __________ Information from ESET Smart Security, version of virus signature
>> database 5377 (20100818) __________
>>
>> The message was checked by ESET Smart Security.
>>
>> http://www.eset.com
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list