alignment sequence reading with stop codons (bug?)

Jason Stajich jason at cgt.mc.duke.edu
Wed Dec 19 21:01:23 UTC 2001


I noticed this in playing with our new bioperl wrappers for EMBOSS.
Apparently -seqall does not read sequences with stop codons.
I can submit as a bug if that is more appropriate.  Getting warmed up to
the EMBOSS dev process.

This occurs with both
EMBOSS-1.9.1
and
CVS code I checked out today (2.0.1 I guess).
The work around is of course to specify the arguments in the correct way
or replace the stop codon with something like X.  I know which sequence
will have potential stop codons so I can work around this in my own code.

[jason at gordola crypto_intergenic]$ cat jason.seq
>SW-CC27_YEAST SW:CC27_YEAST P38042 saccharomyces cerevisiae (baker's
yeast). cell division control protein 27. 10/2001; PIR:S45825 cell
division control protein CDC27 - yeast (Saccharomyces cerevisia
MAVNPELAPFTLSRGIPSFDDQALSTIIQLQDCIQQAIQQLNYSTAEFLAELLYAECSIL
DKSSVYWSDAVYLYALSLFLNKSYHTAFQISKEFKEYHLGIAYIFGRCALQLSQGVNEAI
LTLLSIINVFSSNSSNTRINMVLNSNLVHIPDLATLNCLLGNLYMKLDHSKEGAFYHSEA
LAINPYLWESYEAICKMRATVDLKRVFFDIAGKKSNSHNNNAASSFPSTSLSHFEPRSQP
SLYSKTNKNGNNNINNNVNTLFQSSNSPPSTSASSFSSIQHFSRSQQQQANTSIRTCQNK
NTQTPKNPAINSKTSSALPNNISMNLVSPSSKQPTISSLAKVYNRNKLLTTPPSKLLNND
RNHQNNNNNNNNNNNNNNNNNNNNNNNNIINKTTFKTPRNLYSSTGRLTTSKKNPRSLII
SNSILTSDYQITLPEIMYNFALILRSSSQYNSFKAIRLFESQIPSHIKDTMPWCLVQLGK
LHFEIINYDMSLKYFNRLKDLQPARVKDMEIFSTLLWHLHDKVKSSNLANGLMDTMPNKP
ETWCCIGNLLSLQKDHDAAIKAFEKATQLDPNFAYAYTLQGHEHSSNDSSDSAKTCYRKA
LACDPQHYNAYYGLGTSAMKLGQYEEALLYFEKARSINPVNVVLICCCGGSLEKLGYKEK
ALQYYELACHLQPTSSLSKYKMGQLLYSMTRYNVALQTFEELVKLVPDDATAHYLLGQTY
RIVGRKKDAIKELTVAMNLDPKGNQVIIDELQKCHMQE

[jason at gordola crypto_intergenic]$ cat prot.seq
>Contig5745
CLIF*RLLLIQMIHPQARRAFTFLQQQEPYRIQSMEQLSTLLWHLADLPALSHLSQSLIS
ISRSSPQAWIAVGNCFSLQKDHDEAMRCFRRATQVDEGCAYAWTLCGYEAVEMEEYERAM
AFYRTAIRTDARHYNAWYVLFFFFFFFFVPGDIDS*PKKGMEWG*FISKRIDRGMRSIIL
KEPSKSIQLIPFFYVALVW*VGVSSYPLETMTNIDFPKKKKALEKSNDVVQALHFYERAS
KYAPTSAMVQFKRIRALVALQRYDEAISALVPLTHSAPDEANVFFLLGKCLLKKERRQEA
TMAFTNARELEPK

[jason at gordola crypto_intergenic]$ water jason.seq prot.seq
Smith-Waterman local alignment.
   An error has been found: Sequence Contig5745 must be protein sequence,
 found bad character '*'
   An error has been found: option -seqall: Unable to read sequence
'prot.seq'
   There is a serious problem: water terminated: Bad value for option and
no prompt

[jason at gordola crypto_intergenic]$ water prot.seq jason.seq
Smith-Waterman local alignment.
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output file [contig5745.water]:

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu




More information about the EMBOSS mailing list