[Bioperl-l] Help on a basic EST-genomic alignment script

Fri Jun 25 18:24:57 EDT 2004

Hi all,

I'm a very new perl/bioperl user, and I'm running into some trouble so
I hope I can find some help here.

Here's what I need to do:

(from a large number of fasta files) Extract an EST (peromyscus) from
the FASTA file, find its closest match in mus musclus, and find the
dn/ds between the two sequences.

Here's what I'm doing (I know this is probably the least efficient
way, any suggestions?):

1) Read in pero EST from a FASTA
2) Standaloneblast it to local mus cDNA database, retrieve accession
from best result
3) Retrieve complete mus sequence with features from genbank using ID from (2)
4) Make a clustalw simple align object using the mus protein sequence
from (3) against the translated pero EST for all 3 frames, and keep
the one with the best identity %.
--I'm done up to here--
5) Convert the aln frrom AA to DNA (there is a builtin aa_to_dna_aln
but it isn't working for me)
6) Pass the aln through a DN/DS module (is paml the only one?)

So I have 2 problems: 7 out of the 20 ESTs return a poor "best
alignment" with less than 20% identity (and in the printout they
clearly are not aligning). Does this have something to do with gaps?

In spite of that, the other 13 are aligning pretty well, with 60-100%
alignment. In order to calculate DN/DS I've looked around and it seems
I have to use PAML. But before that I think I'm required to have an
aln object of two DNA sequences, starting at the correct frame. How
can I do that?

-- 
Edward Chuong
http://iacs5.ucsd.edu/~echuong