[Bioperl-l] conservative amino acid change?
Heikki Lehvaslaiho
heikki@ebi.ac.uk
Tue, 07 Aug 2001 11:21:55 +0100
Aaron J Mackey wrote:
>
> On Mon, 6 Aug 2001, Heikki Lehvaslaiho wrote:
>
> > Your plan of starting slowly with simpler tasks and then adding more
> > complex approaches sounds good.
>
> Right: I'd start with Bio::MatrixI.pm and
Let's keep Bio:: clean and use Bio::Matrix::MatrixI.
> Bio::Matrix::SimilarityMatrixI.pm and the methods you proposed.
>
> Then, I'd implement
> Bio::Matrix::SimilarityMatrix
> Bio::Matrix::SimilarityMatrix::PAM
> Bio::Matrix::SimilarityMatrix::BLOSUM
I do not expect there to be dozens of objects here so why not:
Bio::Matrix::Similarity
Bio::Matrix::PAM
Bio::Matrix::BLOSUM
How different can the matrixes be? Do we need a class for every type
of similarity matrix or is it enough to have one class
(Bio::Matrix::Similarity) with an attibute/method format(PAM|BLOSUM)
to tell how the values were generated? Go for separete classes only of
there will be methods in one which are not relevant in the other.
> I suspect that nearly all of these implementations will need to store a
> lower-triangular matrix of values (for PAM, the 1, 10 and 100 or some
> other mix, for BLOSUM all the available ones we want), either in a file or
> via the __DATA__ handle (decision #1). The persistent storage format
Plain files have the problem of not being copies over by perl 'make
install'. IEwan I were discussion this yesterday and he suggested
Bio::Resource class which would defined enviromental parameters and
handle different operating systems. Unless someon wants to implemnt
this now , I think we should go for using __DATA__ handle.
> could be triangular or not (decision #2). Regardless, all of the
I suppose triangular matrixes are more common. On the other hand
symmetrix matrixes are a special case of asymmetrical matrixes, so
input/output could default to full. Your call.
> implementations could benefit from an IO class that was capable of reading
> full or triangular matrices, and an IO class that was capable of writing
> out full or triangular matrices. I'd think that the same class could be
> smart enough to figure out triangular or not.
Why not? If you give an attibute to the IO object that determines the
matrix format (full/triangular), write() can do accordingly.
> Are you thinking something akin to this: ??
>
> use Bio::MatrixIO;
>
> my $matrixio = new Bio::MatrixIO file => 'pam120.mat';
> my $pam120 = $matrixio->read(); # isa SimilarityMatrix, but not a ::PAM
> { triangular => 1 }); # default is
My idea was more like the above one:
use Bio::Matrix::IO; # inherit from Bio::SeqIO
my $matrixin = Bio::Matrix::IO->new
(-format => 'pam', # format is inherited form seqio
-distance => 20, #thows if not available/can not be generated
#-file => 'pam120.mat'; # optional -file
); # uses behind the scenes Bio::Matrix::IO::pam
my $pam120 = $matrixin->read(); # isa Bio::Matrix::Similarity
my $matrixout = Bio::Matrix::IO->new
(-triangular => 1); # STDOUT
# default is full (?)
$matrixio->write($pam120);
> Or more like this:
>
> use Bio::Matrix::SimilarityMatrix::PAM;
>
> my $pam120 = new Bio::Matrix::SimilarityMatrix::PAM file => 'pam120.mat';
> # pam120 isa SimilarityMatrix::PAM
>
> so that MatrixI needs to provide some simple matrix read/write
> functionality that can be used by all it's descendants?
>
> If you want to build the BioPerl sanctified infrastructure here, I'd be
> willing to provide some of the guts.
>
> -Aaron
>
> P.S. And yes, let's not think at all about
> Bio::Matrix::InstaneousRateMatrix::PAM/WAG/JTT.pm quite yet, but what I
> envisioned was something like:
>
> my $pam = new Bio::Matrix::InstaneousRateMatrix::PAM;
> my $pam120 = $pam->substitutionmatrix(120);
> $pam120->write(file => 'pam120.mat');
How about:
use Bio::Matrix::IO; # inherit from Bio::SeqIO
my $matrixin = Bio::Matrix::IO->new
(-format => 'pam',
-type => 'instant', #defaults to 'read'
-distance => 20
); # uses behind the scenes Bio::Matrix::Factory::pam
my $pam120 = $matrixin->read; # isa Bio::Matrix::Similarity
In that way we can use one interface (IO) to mask both read in and
generate matrixes.
-Heikki
> or some such.
>
> --
> o ~ ~ ~ ~ ~ ~ o
> / Aaron J Mackey \
> \ Dr. Pearson Laboratory /
> \ University of Virginia \
> / (434) 924-2821 \
> \ amackey@virginia.edu /
> o ~ ~ ~ ~ ~ ~ o
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________