[Bioperl-l] Recovering conservation lines from clustalw

Peter Schattner schattner@alum.mit.edu
Fri, 11 May 2001 12:30:33 -0700


Brad, Josep

Thanks for sending me sample clustalw files with conservation lines.
Now that I am clear as to what they are I can say with certainty that
AlignIO does not parse the conservation lines.  However SimpleAlign does
provide a method - consensus_string().  When used with  an optional
threshold ranging from 0 to 100, consensus_string returns the consensus
residue only if it is found in more than than the threshold % of the
sequences.  Otherwise  consensus_string will return a "?" at that
location.

Typical usage is:
  use Bio::SimpleAlign;
  use Bio::AlignIO;
  $in  = Bio::AlignIO->new('-file' => $infile , '-format' => 'msf');
  $aln = $in->next_aln() ;
  $threshold_percent = 60;
  $str = $aln->consensus_string($threshold_percent)

See the SimpleAlign documentation or the bioperl tutorial for more
info.  Admittedly this is more cumbersome than simply reading the
information in the file, but hopefully it helps

For fancier "slicing and dicing" of alignments you will need to use
UnivAln for which clustal format IO is not currently supported by
bioperl.

Regards

Peter