[EMBOSS] Transeq question, frame phases

David Mathog mathog at caltech.edu
Wed Feb 16 22:47:15 UTC 2011


Here is another worked example with a small but real mRNA fragment.
(Best cut and paste it into a program with a fixed width font).

Test sequence:

>for  (AKA gi|1728|emb|V00893.1, this is "+" direction)
TCGAAAACCGGGCCATGAAGGATGAGGAGAAGATGGAGCTGCA
GGAGATGCAGCTGAAGGAGGCCAAGCACATTGCCGAGGACTCA
GACCGCAAATACGAGGAGGTGGCCAGGAAGCTGGTGATCCTCGA

>rev (for reversed)
TCGAGGATCACCAGCTTCCTGGCCACCTCCTCGTATTTGCGGT
CTGAGTCCTCGGCAATGTGCTTGGCCTCCTTCAGCTGCATCTC
CTGCAGCTCCATCTTCTCCTCATCCTTCATGGCCCGGTTTTCGA

Transeq output, all 6 frames, for >for and >rev
>for_1
SKTGP*RMRRRWSCRRCS*RRPSTLPRTQTANTRRWPGSW*SSX
>for_2
RKPGHEG*GEDGAAGDAAEGGQAHCRGLRPQIRGGGQEAGDPR
>for_3
ENRAMKDEEKMELQEMQLKEAKHIAEDSDRKYEEVARKLVILX
>for_4
RGSPASWPPPRICGLSPRQCAWPPSAASPAAPSSPHPSWPGFR
>for_5
SRITSFLATSSYLRSESSAMCLASFSCISCSSIFSSSFMARFSX
>for_6
EDHQLPGHLLVFAV*VLGNVLGLLQLHLLQLHLLLILHGPVFX
>rev_1
SRITSFLATSSYLRSESSAMCLASFSCISCSSIFSSSFMARFSX
>rev_2
RGSPASWPPPRICGLSPRQCAWPPSAASPAAPSSPHPSWPGFR
>rev_3
EDHQLPGHLLVFAV*VLGNVLGLLQLHLLQLHLLLILHGPVFX
>rev_4
RKPGHEG*GEDGAAGDAAEGGQAHCRGLRPQIRGGGQEAGDPR
>rev_5
SKTGP*RMRRRWSCRRCS*RRPSTLPRTQTANTRRWPGSW*SSX
>rev_6
ENRAMKDEEKMELQEMQLKEAKHIAEDSDRKYEEVARKLVILX

Output from a different program, all 12 frame options
shown on the fasta header line as: 

  phase(strand)

Positive phases are measured from sequence position 1. 
Negative phases measured from sequence position
N, the last base in the sequence. 
This program differs from transeq in that any
partial codon is emitted as an X.  Note how
transeq output never starts with an X, whereas
here the X maintains its position on the
Nucleic acid sequence, for instance, +1(+) and +1(-).

>gi|1728|emb|V00893.1|[+1(+)] 
SKTGP*RMRRRWSCRRCS*RRPSTLPRTQTANTRRWPGSW*SSX
>gi|1728|emb|V00893.1|[+2(+)] 
RKPGHEG*GEDGAAGDAAEGGQAHCRGLRPQIRGGGQEAGDPR
>gi|1728|emb|V00893.1|[+3(+)] 
ENRAMKDEEKMELQEMQLKEAKHIAEDSDRKYEEVARKLVILX
>gi|1728|emb|V00893.1|[+1(-)] 
XRGSPASWPPPRICGLSPRQCAWPPSAASPAAPSSPHPSWPGFR
>gi|1728|emb|V00893.1|[+2(-)] 
SRITSFLATSSYLRSESSAMCLASFSCISCSSIFSSSFMARFS
>gi|1728|emb|V00893.1|[+3(-)] 
XEDHQLPGHLLVFAV*VLGNVLGLLQLHLLQLHLLLILHGPVF
>gi|1728|emb|V00893.1|[-1(-)] 
SRITSFLATSSYLRSESSAMCLASFSCISCSSIFSSSFMARFSX
>gi|1728|emb|V00893.1|[-2(-)] 
RGSPASWPPPRICGLSPRQCAWPPSAASPAAPSSPHPSWPGFR
>gi|1728|emb|V00893.1|[-3(-)] 
EDHQLPGHLLVFAV*VLGNVLGLLQLHLLQLHLLLILHGPVFX
>gi|1728|emb|V00893.1|[-1(+)] 
XRKPGHEG*GEDGAAGDAAEGGQAHCRGLRPQIRGGGQEAGDPR
>gi|1728|emb|V00893.1|[-2(+)] 
SKTGP*RMRRRWSCRRCS*RRPSTLPRTQTANTRRWPGSW*SS
>gi|1728|emb|V00893.1|[-3(+)] 
XENRAMKDEEKMELQEMQLKEAKHIAEDSDRKYEEVARKLVIL
>gi|1728|emb|V00893.1| 

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the EMBOSS mailing list