[EMBOSS] sixpack -orfminsize
Peter Rice
pmr at ebi.ac.uk
Fri Mar 2 09:02:12 UTC 2012
On 03/01/12 19:18, Ed Siefker wrote:
> Why do I get protein sequences that are shorter than orfminsize/3 when
> I use that option with sixpack? Do I misunderstand what an ORF is?
orfminsize is in protein units, so your threshold is higher than the
longest ORF.
We will update the help and documentation to make this clearer.
> ebs15242 at biox:~/Data/EMBOSS$ sixpack -orfminsize 500 kras.gb
> Display a DNA sequence with 6-frame translation and ORFs
> Output file [nm_004985.sixpack]:
> protein output sequence(s) [nm_004985.fasta]:
> ebs15242 at biox:~/Data/EMBOSS$ head nm_004985.fasta
>> NM_004985_1_ORF1 Translation of NM_004985 in frame 1, ORF 1, threshold 500, 62aa
> GRGGGGSSGGGSGGGEGGGGSASTPGPRHFGLGASAAQALKAAAGPEAQRLPGAGERPAE
> ND
>> NM_004985_1_ORF2 Translation of NM_004985 in frame 1, ORF 2, threshold 500, 31aa
> SKICNIFVMNCTTPNYCNVIKIVTVTKKKKX
>
> 62aa and 31aa lead to sizes of 186bp and 93bp. What's happening to
> the other 300-400 base pairs in the ORF? Why are they not getting
> translated?
With those options you will see only 12 ORFs, all less than your 500aa
threshold.
Sixpack also reports (by default) the ORFs at the beginning and end of
the sequence in case they are part of a longer translation. These are
the ORFs that you see in th eoutput. You can turn them off with the
options -nofirst and -nolast.
The longest ORF in your sequence is 208aa. The -orfminsize value is the
length of the protein translation so 208 is the largest threshold that
will find a complete ORF.
Hope this helps,
Peter Rice
EMBOSS Team
More information about the EMBOSS
mailing list