[EMBOSS] Antwort: transeq changes sequence id

David.Bauer at SCHERING.DE David.Bauer at SCHERING.DE
Wed Jan 24 07:06:17 UTC 2007


Hi,

the _1 is there to indicate the frame which was used for translation.
You can use
transeq myseq.fa -frame 1,2
and this would give a fasta file with two protein sequences.
And that's where the added number makes sense; to prevent the creation of
protein sequences which all have the same ID.

So far about the philosophy of this number ;-)

And now a solution for your problem:

transeq test.fa | descseq -filter -name `infoseq -nohead -only -name
test.fa`

This works only if you have just one sequence in the input file. If you
have a multiple sequence fasta file, you can use seqretsplit to create
individual sequence files for each sequence.

HTH,
David.

emboss-bounces at lists.open-bio.org schrieb am 23/01/2007 20:23:28:

> I'm using transeq to translate a bunch of sequence for me and noticed
that
> upon translation, it adds a '_1' to the seqid.  For example:
>
> I give it a file with
> >myseq
> ATG...TAG
>
> After translation, the resulting file contains:
> >myseq_1
> M...
>
> Is there a way to prevent transeq from manipulating the FASTA header and
> just translate the sequence?
>
> Ryan
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss




More information about the EMBOSS mailing list