[Bioperl-l] Recovering conservation lines from clustalw
Peter Schattner
schattner@alum.mit.edu
Fri, 11 May 2001 12:30:33 -0700
Brad, Josep
Thanks for sending me sample clustalw files with conservation lines.
Now that I am clear as to what they are I can say with certainty that
AlignIO does not parse the conservation lines. However SimpleAlign does
provide a method - consensus_string(). When used with an optional
threshold ranging from 0 to 100, consensus_string returns the consensus
residue only if it is found in more than than the threshold % of the
sequences. Otherwise consensus_string will return a "?" at that
location.
Typical usage is:
use Bio::SimpleAlign;
use Bio::AlignIO;
$in = Bio::AlignIO->new('-file' => $infile , '-format' => 'msf');
$aln = $in->next_aln() ;
$threshold_percent = 60;
$str = $aln->consensus_string($threshold_percent)
See the SimpleAlign documentation or the bioperl tutorial for more
info. Admittedly this is more cumbersome than simply reading the
information in the file, but hopefully it helps
For fancier "slicing and dicing" of alignments you will need to use
UnivAln for which clustal format IO is not currently supported by
bioperl.
Regards
Peter