[EMBOSS] getorf includes unspecified amino acids as part of the ORF sequence

Fungazid fungazid at yahoo.com
Wed Jan 13 16:30:26 UTC 2010


Thanks Peter,

I made a mistake and took a repeat masked contigs instead of the original contigs, and they indeed had Ns. Sorry for the mess (still, I am looking for an option where Ns are not be included in the ORF).

Avi



----- Original Message ----
From: Peter Rice <pmr at ebi.ac.uk>
To: Fungazid <fungazid at yahoo.com>
Cc: emboss at lists.open-bio.org
Sent: Tue, January 12, 2010 4:15:28 PM
Subject: Re: [EMBOSS] getorf includes unspecified amino acids as part of the ORF sequence

Hi Avi,

> The input is a simple fasta file with only A,C,T,G letters and
> nothing else, so I wouldn't expect any Xs. In addition, even if there
> would be Ns (and there are no Ns) the program cannot know if such Ns
> do not include stopcodons so it should not consider them as part of an ORF.

>>> 00001_3 [803 - 1120]
>> LARLRFVVLGNSFIASAKGWSTPYGPTTFGPFRSCIYPRVFRSTRVRKAMATRIGSNRVN
>> ILIRCTXXXXXXXXXXXXXXXXXXXXXXXXXNPYLGWWCYIFCIFR

That suggests the Xs have all come from stop codons.

There are other possibilities, including a badly formatted input file
(perhaps two sequences and descriptions read as one).

We do need to see the input file to know where those Xs are from.

Peter Rice



      





More information about the EMBOSS mailing list