[Bioperl-l] Questions on Representing Protein Ambiguity

James Thompson tex at biosysadmin.com
Sun Oct 3 06:15:04 EDT 2004


Aaron,

Thanks for the feedback. You're definitely right about consensus sequences
being relatively worthless when compared to the information contained in the
whole profile.

Friday afternoon I committed some to ProtMatrix.pm that will allow the regexp
method to take a threshold as an argument, and it's not too hard to change.

The Bio::Tools::dpAlign idea looks interesting, I'd never seen it before
myself. Sometime down the road I'll look into making it use matrices from the
Bio::Matrix::PSM family. Right now I'll work on making sure all of my code is
release-worthy. :)

James Thompson

On Fri, 1 Oct 2004, Aaron J. Mackey wrote:

> 
> On Sep 30, 2004, at 10:49 PM, James Thompson wrote:
> 
> > An alternative would be to borrow an idea from Perl's regex character 
> > classes
> > and represent multiple residues at a position inside of a set of 
> > brackets, like
> > this:
> >
> > M[ES]N[IAP]S
> 
> In general, you're always going to lose information moving from a 
> profile to a flat pattern.  This option prevents losing all the 
> information that flattening to "MENIS" would (although MENIS is a 
> reasonable "consensus" in this case), but there's still information 
> loss.  So in that sense it isn't really a better solution than "just 
> take the most probable residue, unless it's less than some threshold, 
> in which case X".
> 
> I think the whole idea of a consensus sequence from a profile is a bit 
> worthless, to be honest.  What are you supposed to be able to do with 
> the consensus, search with it?  That's what the profile is for in the 
> first place ... [ speaking of which, I'd love to see 
> Bio::Tools::dpAlign make use of these protein profiles ].
> 
> -Aaron
> 
> --
> Aaron J. Mackey, Ph.D.
> Dept. of Biology, Goddard 212
> University of Pennsylvania       email:  amackey at pcbi.upenn.edu
> 415 S. University Avenue         office: 215-898-1205
> Philadelphia, PA  19104-6017     fax:    215-746-6697
> 
> 







More information about the Bioperl-l mailing list