[BioRuby] Alignment plugin
diapriid at gmail.com
Mon Apr 26 17:00:06 UTC 2010
Another conundrum I just thought of- in some cases gaps ("-") are used
to infer both evolutionary events (aligning columns) and as space
fillers for data that are contiguous with aligned partitions but not
themselves considered to be aligned (e.g. a partition in a MSA
containing a loop in some structural alignments). Again this might be
be best practice but it does happen.
While I don't know how things are modeled right now (i.e. this may
already be the case) it seems that gaps should not properties of
sequences, but rather properties of MSAs, as they only really only
exist when two or more sequences are being compared.
On Mon, Apr 26, 2010 at 12:44 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Mon, Apr 26, 2010 at 4:41 PM, Matt <diapriid at gmail.com> wrote:
>>> Each nucleotide/aminoacid has properties by itself. Do gaps have
>> Not sure if you mean a single gap ("-") or any gap between
>> nucleotides. If the latter then it would be nice if gaps had
>> properties like
>> * length
>> * at_beginning_sequence (preceeds all bases)
>> * at_end_of_sequence (found at end of all bases)
>> Another distinction- gaps in found in a typical MSA may indicate real
>> gaps (as inferred from evolutionary events in an alignment), or
>> missing data, depending on how sloppy/precise a person is.
> It is worse than that - you have leading padding, trailing padding
> and insertions which are all often represented with the same
> character. Then there are special cases like HMMER which
> uses two gap characters (- and .) depending on the model state:
> Then you have the other meaning of dot (.) as in PHYLIP format
> and some visualisations meaning same as the first sequence.
> Plus, to keep life interesting, some formats (e.g. ACE) use the
> asterisk (*) as the gap character (usually a stop symbol when
> working with protein sequences).
More information about the BioRuby