[Bioperl-l] Protein families

Wed Feb 11 21:02:09 UTC 2009

Hello everyone,

This question is somewhat unrelated to Bioperl technical issues, but I
hope I can get some answers.
What would be a sane way to address whether a sequence is part of a
family? Since it's too broad of an issue, I'll restrict it:
   - It doesn't have to use online services.
   - It has to be scriptable.
   - It has to rely only on the aminoacidic sequence (ie, no
      experimental evidence, including 3D structure).
   - If possible, it should be fast.
   - For extra points, it should be simple (or complicated, but have a
      ready-to-use library).

The context is this: I want to perform some GA randomization on a
protein sequence to optimize for an arbitrary target function (for
instance, increase occurrence of certain type of proteolytic enzymes) , but I
also want to minimize the chance of losing the protein's original
function. So I thought that I'd need some sort of quantitative measure
of how close the sequence is to belonging to the original's family.

The simplest way that I can think of for doing this is to first
build a profile for the family, based on a multiple sequence
alignment; then to align each random sequence against the profile and
calculate an e-value. But since I don't know much about this things, I
really can't judge whether it makes sense or is completely wrong.
Using Bio::Tools::HMM sounded fine, but unfortunately it doesn't offer
a method for calculating the probability of an observation sequence,
given the profile.

What would you suggest? Thanks in advance!

PS: If there is a more appropriate mailing list for this sort of
questions, please don't hesitate to educate me.

Bruno.

      Yahoo! Cocina
Recetas prácticas y comida saludable
http://ar.mujer.yahoo.com/cocina/