[BioRuby] Alignment plugin
biopython at maubp.freeserve.co.uk
Mon Apr 26 16:44:04 UTC 2010
On Mon, Apr 26, 2010 at 4:41 PM, Matt <diapriid at gmail.com> wrote:
>> Each nucleotide/aminoacid has properties by itself. Do gaps have
> Not sure if you mean a single gap ("-") or any gap between
> nucleotides. If the latter then it would be nice if gaps had
> properties like
> * length
> * at_beginning_sequence (preceeds all bases)
> * at_end_of_sequence (found at end of all bases)
> Another distinction- gaps in found in a typical MSA may indicate real
> gaps (as inferred from evolutionary events in an alignment), or
> missing data, depending on how sloppy/precise a person is.
It is worse than that - you have leading padding, trailing padding
and insertions which are all often represented with the same
character. Then there are special cases like HMMER which
uses two gap characters (- and .) depending on the model state:
Then you have the other meaning of dot (.) as in PHYLIP format
and some visualisations meaning same as the first sequence.
Plus, to keep life interesting, some formats (e.g. ACE) use the
asterisk (*) as the gap character (usually a stop symbol when
working with protein sequences).
More information about the BioRuby