[Bioperl-l] conservative amino acid change?

Heikki Lehvaslaiho heikki@ebi.ac.uk
Tue, 07 Aug 2001 11:21:55 +0100


Aaron J Mackey wrote:
> 
> On Mon, 6 Aug 2001, Heikki Lehvaslaiho wrote:
> 
> > Your plan of starting slowly with simpler tasks and then adding more
> > complex approaches sounds good.
> 
> Right: I'd start with Bio::MatrixI.pm and

Let's keep Bio:: clean and use Bio::Matrix::MatrixI.

> Bio::Matrix::SimilarityMatrixI.pm and the methods you proposed.
> 
> Then, I'd implement
> Bio::Matrix::SimilarityMatrix
> Bio::Matrix::SimilarityMatrix::PAM
> Bio::Matrix::SimilarityMatrix::BLOSUM

I do not expect there to be dozens of objects here so why not:

Bio::Matrix::Similarity
Bio::Matrix::PAM
Bio::Matrix::BLOSUM

How different can the matrixes be? Do we need a class for every type
of similarity matrix or is it enough to have one class
(Bio::Matrix::Similarity) with an attibute/method format(PAM|BLOSUM)
to tell how the values were generated? Go for separete classes only of
there will be methods in one which are not relevant in the other. 
 
> I suspect that nearly all of these implementations will need to store a
> lower-triangular matrix of values (for PAM, the 1, 10 and 100 or some
> other mix, for BLOSUM all the available ones we want), either in a file or
> via the __DATA__ handle (decision #1).  The persistent storage format

Plain files have the problem of not being copies over by perl 'make
install'. IEwan I were discussion this yesterday and he suggested
Bio::Resource class which would defined enviromental parameters and
handle different operating systems. Unless someon wants to implemnt
this now , I think we should go for using __DATA__ handle.

> could be triangular or not (decision #2).  Regardless, all of the

I suppose triangular matrixes are more common. On the other hand
symmetrix matrixes are a special case of asymmetrical matrixes, so
input/output could default to full. Your call. 

> implementations could benefit from an IO class that was capable of reading
> full or triangular matrices, and an IO class that was capable of writing
> out full or triangular matrices.  I'd think that the same class could be
> smart enough to figure out triangular or not.

Why not? If you give an attibute to the IO object that determines the
matrix format (full/triangular), write() can do accordingly.

> Are you thinking something akin to this: ??
> 
> use Bio::MatrixIO;
> 
> my $matrixio = new Bio::MatrixIO file => 'pam120.mat';
> my $pam120 = $matrixio->read(); # isa SimilarityMatrix, but not a ::PAM
>  { triangular => 1 }); # default is

My idea was more like the above one:

use Bio::Matrix::IO; # inherit from Bio::SeqIO

my $matrixin = Bio::Matrix::IO->new 
     (-format => 'pam',  # format is inherited form seqio
      -distance => 20, #thows if not available/can not be generated
      #-file => 'pam120.mat'; # optional -file
     ); # uses behind the scenes Bio::Matrix::IO::pam
my $pam120 = $matrixin->read(); # isa Bio::Matrix::Similarity
my $matrixout = Bio::Matrix::IO->new
    (-triangular => 1); # STDOUT
                        # default is full (?)
$matrixio->write($pam120);


> Or more like this:
> 
> use Bio::Matrix::SimilarityMatrix::PAM;
> 
> my $pam120 = new Bio::Matrix::SimilarityMatrix::PAM file => 'pam120.mat';
> # pam120 isa SimilarityMatrix::PAM
> 
> so that MatrixI needs to provide some simple matrix read/write
> functionality that can be used by all it's descendants?
> 
> If you want to build the BioPerl sanctified infrastructure here, I'd be
> willing to provide some of the guts.
> 
> -Aaron
> 
> P.S. And yes, let's not think at all about
> Bio::Matrix::InstaneousRateMatrix::PAM/WAG/JTT.pm quite yet, but what I
> envisioned was something like:
> 
> my $pam = new Bio::Matrix::InstaneousRateMatrix::PAM;
> my $pam120 = $pam->substitutionmatrix(120);
> $pam120->write(file => 'pam120.mat');

How about:

use Bio::Matrix::IO; # inherit from Bio::SeqIO

my $matrixin = Bio::Matrix::IO->new 
     (-format => 'pam',  
      -type => 'instant', #defaults to 'read'
      -distance => 20
     ); # uses behind the scenes Bio::Matrix::Factory::pam
my $pam120 = $matrixin->read; # isa Bio::Matrix::Similarity


In that way we can use one interface (IO) to mask both read in and
generate matrixes.

	-Heikki

> or some such.
> 
> --
>  o ~   ~   ~   ~   ~   ~  o
> / Aaron J Mackey           \
> \  Dr. Pearson Laboratory  /
>  \ University of Virginia  \
>  /  (434) 924-2821          \
>  \  amackey@virginia.edu    /
>   o ~   ~   ~   ~   ~   ~  o

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________