[EMBOSS] getorf: propose new switch '-longest'

Peter Cock p.j.a.cock at googlemail.com
Tue Jun 4 09:32:33 UTC 2013


On Tue, Jun 4, 2013 at 9:29 AM, Chevreux, Bastien
<bastien.chevreux at dsm.com> wrote:
> Hi there,
>
> in case someone goes over 'getorf' for a next revision: I propose to
> add a switch like '-longest' to retrieve the longest potential ORF per
> sequence. Maybe also something like '-grace PERCENT' which - in
> combination with '-longest' - would retrieve the longest ORF and all
> ORFs which are up to PERCENT smaller than the longest one.
>
> Best,
>   Bastien

Good idea - I was just going over my Galaxy getorf equivalent
and comparing it - at the time I may have forgotten the EMBOSS
getorf had a Galaxy wrapper:

http://toolshed.g2.bx.psu.edu/view/peterjc/get_orfs_or_cdss

One major differences is my script does have a "longest" mode -
actually three modes to offer some choice in the tie breaker
situation where there are multiple equally long ORFs:

* All ORFs/CDSs from each sequence
* All ORFs/CDSs from each sequence with the maximum length
* First ORF/CDS from each sequence with the maximum length

This was written specifically as a way to quickly pull out probable
coding sequence from transcriptome assemblies (as amino acids
or nucleotides) where for simplicity I just wanted the longest
candidate protein sequence.

Peter C.



More information about the EMBOSS mailing list