alignment sequence reading with stop codons (bug?)
Jason Stajich
jason at cgt.mc.duke.edu
Wed Dec 19 21:01:23 UTC 2001
I noticed this in playing with our new bioperl wrappers for EMBOSS.
Apparently -seqall does not read sequences with stop codons.
I can submit as a bug if that is more appropriate. Getting warmed up to
the EMBOSS dev process.
This occurs with both
EMBOSS-1.9.1
and
CVS code I checked out today (2.0.1 I guess).
The work around is of course to specify the arguments in the correct way
or replace the stop codon with something like X. I know which sequence
will have potential stop codons so I can work around this in my own code.
[jason at gordola crypto_intergenic]$ cat jason.seq
>SW-CC27_YEAST SW:CC27_YEAST P38042 saccharomyces cerevisiae (baker's
yeast). cell division control protein 27. 10/2001; PIR:S45825 cell
division control protein CDC27 - yeast (Saccharomyces cerevisia
MAVNPELAPFTLSRGIPSFDDQALSTIIQLQDCIQQAIQQLNYSTAEFLAELLYAECSIL
DKSSVYWSDAVYLYALSLFLNKSYHTAFQISKEFKEYHLGIAYIFGRCALQLSQGVNEAI
LTLLSIINVFSSNSSNTRINMVLNSNLVHIPDLATLNCLLGNLYMKLDHSKEGAFYHSEA
LAINPYLWESYEAICKMRATVDLKRVFFDIAGKKSNSHNNNAASSFPSTSLSHFEPRSQP
SLYSKTNKNGNNNINNNVNTLFQSSNSPPSTSASSFSSIQHFSRSQQQQANTSIRTCQNK
NTQTPKNPAINSKTSSALPNNISMNLVSPSSKQPTISSLAKVYNRNKLLTTPPSKLLNND
RNHQNNNNNNNNNNNNNNNNNNNNNNNNIINKTTFKTPRNLYSSTGRLTTSKKNPRSLII
SNSILTSDYQITLPEIMYNFALILRSSSQYNSFKAIRLFESQIPSHIKDTMPWCLVQLGK
LHFEIINYDMSLKYFNRLKDLQPARVKDMEIFSTLLWHLHDKVKSSNLANGLMDTMPNKP
ETWCCIGNLLSLQKDHDAAIKAFEKATQLDPNFAYAYTLQGHEHSSNDSSDSAKTCYRKA
LACDPQHYNAYYGLGTSAMKLGQYEEALLYFEKARSINPVNVVLICCCGGSLEKLGYKEK
ALQYYELACHLQPTSSLSKYKMGQLLYSMTRYNVALQTFEELVKLVPDDATAHYLLGQTY
RIVGRKKDAIKELTVAMNLDPKGNQVIIDELQKCHMQE
[jason at gordola crypto_intergenic]$ cat prot.seq
>Contig5745
CLIF*RLLLIQMIHPQARRAFTFLQQQEPYRIQSMEQLSTLLWHLADLPALSHLSQSLIS
ISRSSPQAWIAVGNCFSLQKDHDEAMRCFRRATQVDEGCAYAWTLCGYEAVEMEEYERAM
AFYRTAIRTDARHYNAWYVLFFFFFFFFVPGDIDS*PKKGMEWG*FISKRIDRGMRSIIL
KEPSKSIQLIPFFYVALVW*VGVSSYPLETMTNIDFPKKKKALEKSNDVVQALHFYERAS
KYAPTSAMVQFKRIRALVALQRYDEAISALVPLTHSAPDEANVFFLLGKCLLKKERRQEA
TMAFTNARELEPK
[jason at gordola crypto_intergenic]$ water jason.seq prot.seq
Smith-Waterman local alignment.
An error has been found: Sequence Contig5745 must be protein sequence,
found bad character '*'
An error has been found: option -seqall: Unable to read sequence
'prot.seq'
There is a serious problem: water terminated: Bad value for option and
no prompt
[jason at gordola crypto_intergenic]$ water prot.seq jason.seq
Smith-Waterman local alignment.
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output file [contig5745.water]:
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the EMBOSS
mailing list