[Bioperl-l] (no subject)

Mon Jul 21 12:36:04 UTC 2008

Hi Ronnie,
I'm not sure I'm following you -- you start with a database of cDNA
sequences, but you're asking how to obtain the cDNA sequence.

Do you mean, once you've identified in protein space with HMMer a subset of
sequences that contain a certain domain, how do you pick out the
corresponding cDNA sequences from your starting database?

I'm not sure this is what your mean, but you should be able to generate a
lookup hash of which protein sequence identifier corresponds to which cDNA
identifier. Once you've used your protein IDs to get the list of cDNA IDs,
then you can extract the cDNA sequences from your original database. It
would probably be possible to use Bio::Annotation to keep track of the
relationship between a protein ID and a cDNA ID, but this seems like
overkill to me compared to a plain old hash.

If you haven't already, you may want to check out the PAML HOWTO on the
BioPerl website
http://www.bioperl.org/wiki/HOWTO:PAML#Running_PAML_from_within_BioPerl

which shows the pairwise_kaks script. Or look directly at the script itself,
included in the bioperl-live distribution under scripts/utilities.

In any case, I'm not sure I've answered your question -- please follow up if
I've missed the point.

Dave