alignment sequence reading with stop codons (bug?)

David Bauer bauer at genprofile.com
Thu Dec 20 07:02:56 UTC 2001


Hi,

the protein alignment programs don't like the '*' in your protein
sequences. They are designed to align true proteins which usualy do not
contain stop codons.
If this are putative ORFs, a solution would be to split them up at the
stops, creating a separate protein sequence for each ORF.

I also guess you are misinterpreting the -seqall. This means to return
all sequences from a file containing more than one sequence (like a
fasta formated file with several sequences separated by theire
description lines). For me the -seqall option does not make much sense
in the case of alignment programs which need exactly 2 sequences to
align.
There you must always pass the two sequence files which you want align
as arguments to the alignment program and each file must contain exactly
one sequence.

I hope this helps,

David Bauer.


Jason Stajich wrote:
> 
> I noticed this in playing with our new bioperl wrappers for EMBOSS.
> Apparently -seqall does not read sequences with stop codons.
> I can submit as a bug if that is more appropriate.  Getting warmed up to
> the EMBOSS dev process.




More information about the EMBOSS mailing list