[EMBOSS] getorf includes unspecified amino acids as part of the ORF sequence

Fungazid fungazid at yahoo.com
Tue Jan 12 13:33:14 UTC 2010


Hi Peter,

The input is a simple fasta file with only A,C,T,G letters and nothing else, so I wouldn't expect any Xs. In addition, even if there would be Ns (and there are no Ns) the program cannot know if such Ns do not include stop codons so it should not consider them as part of an ORF.

Best,
Avi



----- Original Message ----
From: Peter <biopython at maubp.freeserve.co.uk>
To: Fungazid <fungazid at yahoo.com>
Cc: emboss at lists.open-bio.org
Sent: Mon, January 11, 2010 5:53:02 PM
Subject: Re: [EMBOSS] getorf includes unspecified amino acids as part of the ORF sequence

On Mon, Jan 11, 2010 at 2:26 PM, Fungazid <fungazid at yahoo.com> wrote:
>
> Hello people,
>
> I just installed emboss on linux ubuntu (using the ubuntu synaptic package manager). I am using the getorf program, and I see it gives me this kind of output lines:
>
>>00001_3 [803 - 1120]
> LARLRFVVLGNSFIASAKGWSTPYGPTTFGPFRSCIYPRVFRSTRVRKAMATRIGSNRVN
> ILIRCTXXXXXXXXXXXXXXXXXXXXXXXXXNPYLGWWCYIFCIFR
>
> I don't like the Xs as they represent unspecified amino acids. Is there an input parameter to tell the program to report only the regions before and after the Xs ?
>
> In addition (and maybe this is beyond the scope of this mailing list) what is the biological meaning of such Xs ?

What was the input sequence like? Was there a stretch of NNNNN perhaps?

Peter



      





More information about the EMBOSS mailing list