[EMBOSS] using sixpack

Peter Rice pmr at ebi.ac.uk
Wed Dec 2 09:43:42 UTC 2009


On 12/01/2009 04:17 PM, Matthias Dodt wrote:
> Hi there!
>
> I have some problems using sixpack for 6-frame translation. I want to
> convert a fasta file of contigs with sixpack. The command is:
>
> sixpack contigs.fa -outseq protein_sequence
>
> The problem is that sixpack only converts the first sequence in the
> fasta file. How can i force it to process the whole file??

Two options:

One is to change the EMBOSS code to loop over each sequence.

The other is to write a script that extracts each sequence in turn and
launches sixpack.

We can consider this for the next EMBOSS release. It applies to other
applications too. In general, would users (and developers of web and
other interfaces) be happy if more applications could read every
sequence in a fasta file?

This raises questions of how to mark up the output so that it is clear
where each results comes from. There will always be applications where
it is more sensible to proces sonly a single sequence.

A third option (there is so often another way):

getorf will find and report open reading frames in all input sequences

getorf contigs.fa -outseq protein_sequence

There will be differences in the output - getorf limits ORFs to 30
nucleotides. You get the same effect in sixpack with -orfmin 10 (oops,
sixpack counts amino acids - we will try to make them consistent in the
next release!)

You can also add -minsize 3 to the getorf command line to report all
ORFs like sixpack does.

Hope this helps,

Peter Rice



More information about the EMBOSS mailing list