[EMBOSS] protein sequence input
pmr at ebi.ac.uk
pmr at ebi.ac.uk
Wed Jul 28 14:13:37 UTC 2004
Hi Bobby,
> I am using the emboss to make protein sequence
> analysis.
>
> I want the program "water"(smith-waterman algorithm)
> to take in the characters "J","O","U" which are not
> aminoacid symbols.
>
> Can I change the code?, If so, in which file I have to
> make this change, to make the program take this
> desired input
Interesting question. The sequence types are checked in ajax/ajseqtype.c
But there is also the question of whether your sequence is really a protein.
Perhaps we should allow "alpha" as a sequence type, with its own
comparison matrices. It could get complicated (we need to check whether we
assume all non-nucleic sequences are protein, for example).
"U" is a valid protein code (for selenocysteine). "O" is used as a gap
character by some formats. "J" is not used. I have seen "O" and "J" used
in modified matrices before - though that was as DNA to score CpG islands
differently (CG was converted to OJ and given higher match scores)
Perhaps, as a quick solution, you could try using the protein ambiguity
codes B and Z instead of O and J? Then you could use a normal protein
sequence.
Hope that helps,
Peter
More information about the EMBOSS
mailing list