[EMBOSS] sixpack -orfminsize

Peter Rice pmr at ebi.ac.uk
Fri Mar 2 09:02:12 UTC 2012


On 03/01/12 19:18, Ed Siefker wrote:
> Why do I get protein sequences that are shorter than orfminsize/3 when
> I use that option with sixpack?   Do I misunderstand what an ORF is?

orfminsize is in protein units, so your threshold is higher than the 
longest ORF.

We will update the help and documentation to make this clearer.

> ebs15242 at biox:~/Data/EMBOSS$ sixpack -orfminsize 500 kras.gb
> Display a DNA sequence with 6-frame translation and ORFs
> Output file [nm_004985.sixpack]:
> protein output sequence(s) [nm_004985.fasta]:
> ebs15242 at biox:~/Data/EMBOSS$ head nm_004985.fasta
>> NM_004985_1_ORF1  Translation of NM_004985 in frame 1, ORF 1, threshold 500, 62aa
> GRGGGGSSGGGSGGGEGGGGSASTPGPRHFGLGASAAQALKAAAGPEAQRLPGAGERPAE
> ND
>> NM_004985_1_ORF2  Translation of NM_004985 in frame 1, ORF 2, threshold 500, 31aa
> SKICNIFVMNCTTPNYCNVIKIVTVTKKKKX
>
> 62aa and 31aa lead to sizes of 186bp and 93bp.  What's happening to
> the other  300-400 base pairs in the ORF?  Why are they not getting
> translated?

With those options you will see only 12 ORFs, all less than your 500aa 
threshold.

Sixpack also reports (by default) the ORFs at the beginning and end of 
the sequence in case they are part of a longer translation. These are 
the ORFs that you see in th eoutput. You can turn them off with the 
options -nofirst and -nolast.

The longest ORF in your sequence is 208aa. The -orfminsize value is the 
length of the protein translation so 208 is the largest threshold that 
will find a complete ORF.

Hope this helps,

Peter Rice
EMBOSS Team



More information about the EMBOSS mailing list