[BioRuby] Alignment plugin
pjotr.public14 at thebird.nl
Mon Apr 26 16:30:55 UTC 2010
On Mon, Apr 26, 2010 at 04:40:11PM +0100, Rutger Vos wrote:
> On Mon, Apr 26, 2010 at 4:04 PM, Pjotr Prins <pjotr.public14 at thebird.nl> wrote:
> > Maybe we should start defining a basic sequence object. What would we
> > want from it, what should be core and what should be mixed in?
> > Alignments and secondary structures should build on that.
> In the interest of learning from other Bio* projects ;-) it should be
> noted that there is a bit of a mismatch between sequences as
> standalone objects on the one hand, and rows within character state
> matrices on the other, especially when you consider types of data
> beyond molecular sequences (e.g. morphological character state data).
> Within a matrix there are columns such that every cell in a sequence
> now becomes a concrete instance of one of a limited set of character
> states for that character/column. Especially for morphological data
> there could be very esoteric ambiguity mappings from one state in that
> column to another. Imagine an alignment with unique mappings a la the
> IUPAC single character codings for each column. The upshot might be
> that you'd need a mapping object for each cell, though you'd use an
> immutable class for molecular data.
I think I understand what you mean here. The way I see it is that the
sequences are immutable lists of nucleotides/amino acids. State can
be at row, column or individual matrix point level.
I guess it is impossible to impose the way people want to use the data
structure. Either they use state as a loose component (could be a
matrix) projected on the sequences, or (if our format allows it) they
could maintain state at each of the three levels (row, column, point).
In my case I would like to add state into the data structure (one
advantage could be that it would be relatively easy to export, also to
RDF). We have an alignment:
aln = Alignment.new(sequences)
I would like to annotate column 4:6 as having high homology
maybe I want to remove a part of sequence 3 and mark it as such
aln.sequence(3, :position=>20..30, :deleted=>TRUE)
or indicate an ORF
aln.sequence(3, :position=>40..65, :orf=>TRUE)
and fetch information, like quality scores
sequence = aln.sequence(3)
quality = sequence.quality(:position=>40..65)
Any variations, thereof. State would be maintained inside
Alignment(Column), Sequence or Nucleotide/Aminoacid.
More information about the BioRuby