[Biojava-l] Introducing a mutation in a DNA sequence

Thu Apr 2 10:12:04 UTC 2015

Hi!
Sorry for the giant text in my previous, I don't know what happened.
Please find a comment below.
Paolo

2015-04-02 11:02 GMT+02:00 Ben Stöver <benstoever at uni-muenster.de>:

> Although this probably works perfectly fine for a lot of tasks, I think it
> would have disadvantages when sequentially applying lots of
> mutations/edits to
> a sequence (e.g. in an (GUI based) alignment editor with Sequence objects
> as
> data backend or an application that simulates evolution of sequences along
> a
> large phylogenetic tree by sequentially applying mutations). In such cases
> the
> resulting sequence (containing all mutations/edits) would be a sequence
> view
> on the top a stack of other sequence views (each of them defining one
> mutation). Depending on the index, a call of getCompoundAt() would lead to
> a
> trace back of the whole stack in the worst case (if that compound was
> present
> in the initial sequence). (I hope this was more understandable this time?)
>

> For such applications it would be nice to have a Sequence implementation
> (extending interface) or anything similar that is able to really edit the
> underlying sequence without creating a stack of views with increasing
> size. I
> have some applications that would benefit from this (e.g.
> http://bioinfweb.info/LibrAlign/ ), but of course I would like to ask the
> community how relevant such a feature would be in general.
>

You project looks very interesting!
I don't see completely necessary to use a stack of views if mutation
objects (let's say to define Mutation: an Interface to describe a mutation)
are stored in a Sequence object by means of a collection. So every
subsequent edit could be added to the others in the same Sequence object.

Moreover, just as an idea, a SequenceView could allow you to retrieve only
sequence string with mutations in a specified range, it may speed up the
rendering process.

>
> @Andreas: Why would I not have an EditableSequence interface extend from
> Sequence?
>
> Generally I would have nothing against this, because all methods from
> Sequence
> could be inherited and implementations of EditableSequence could be passed
> to
> methods which have parameters of type Sequence. (This was my initial idea,
> I
> posted in November.) Of course that would still violate the idea of atomic
> sequences a bit, because methods with parameters of type Sequence would
> have
> to check if the passed objects also implement EditableSequence to know that
> they cannot assume their contents to be immutable. In that case such
> methods
> could e.g. throw an exception if they cannot handle EditableSequences, but
> each method out there would have to implement this behavior.
>
> Best
> Ben
>
>
> Andreas Prlic schrieb am 2015-04-02:
> > Hi,
>
> > I agree with Ben's summary. The basic philosophy is that sequences
> > are not
> > mutable.  It is clear that we need some mechanism to introduce
> > mutations in
> > sequences, without having to allocate a copy of the sequence in
> > memory.
>
> > About Mark's suggestion: I think Paolo's comment to represent
> > mutations via
> > a "SequenceView" goes in a similar direction.
>
> > I hear two suggestions for how to do this so far:
>
> > A) Mutations via a SequenceView
> > B) introduction of an EditableSequence interface.
>
> > Ben: Could you comment a bit further why you would not have an
> > EditableSequence interface extend from Sequence?
>
> > ==
>
> > Having said that, currently sequence manipulation is possible via
> > "edits",
> > however I suspect this is too complicated from an API perspective?
>
> > >From EditSequenceTest :
>
> > public void substitute() throws CompoundNotFoundException {
> >   DNASequence seq = new DNASequence("ACGT");
> >   assertSeq(new Edit.Substitute<NucleotideCompound>("T",
> >   2).edit(seq), "ATGT");
> >   assertSeq(new Edit.Substitute<NucleotideCompound>("TT",
> >   2).edit(seq), "ATTT");
> >   assertSeq(new Edit.Substitute<NucleotideCompound>("T",
> >   1).edit(seq), "TCGT");
> >   assertSeq(new Edit.Substitute<NucleotideCompound>("TTC",
> > 2).edit(seq), "ATTC");
> > }
>
> > .edit() is using the JoiningSequenceReader under the hood which has a
> > getCompoundAt method.
>
> > Andreas
>
>
>
>
>
>
>
> > On Wed, Apr 1, 2015 at 3:23 PM, Paolo Pavan <paolo.pavan at gmail.com>
> > wrote:
>
> > > Thank you Mark, I think it should be better to clarify this point,
> > > I may
> > > have a different idea in my mind.
>
> > > Are we talking about a sequence object that given a "parent"
> > > sequence will
> > > show the result of applying a set of mutations descriptors?
> > > Should this result still be a Sequence object such that it will be
> > > possible to apply any further processing that takes a
> > > AbstractSequence in
> > > input? (e.g.:performing a sequence alignment with SmithWaterman)
> > > Should this result be the same Sequence object instantiated given
> > > in input
> > > which, with some mechanism to implement, will show a sequence
> > > string
> > > different from the original resulting by applying mutation
> > > descriptors?
>
> > > If it is so, why do not implement it with SequenceView, the same
> > > mechanism
> > > we get a reverse complemented sequence?
> > > If this will be accomplished, there will be no need for a new
> > > interface
> > > EditableSequence and conversion to/from Sequence, am I wrong?
> > > Ben, could you better clarify your concerns about not having such a
> > > design? Why you still see advantages in a mutable implementation of
> > > Sequence instead?
>
> > > 2015-04-01 19:13 GMT+02:00 Mark Fortner <phidias51 at gmail.com>:
>
> > >> Just out of curiosity, could mutations be applied as annotations
> > >> to a
> > >> wild-type sequence? The sequence would remain unedited, but you
> > >> would still
> > >> be able to represent the mutation and related annotations.  This
> > >> might work
> > >> for SNPs, and indels, but I'm not sure how you would deal with
> > >> chromosomal
> > >> translocations.
>
> > >> Also, would it be useful to be able to reference external variant
> > >> databases like ClinVar or SwissVar when specifying a mutation?
>
> > >> Regards,
>
> > >> Mark
>
>
> > >> On Wed, Apr 1, 2015 at 9:20 AM, Ben Stöver
> > >> <benstoever at uni-muenster.de>
> > >> wrote:
>
> > >>> Hi Paolo and all,
>
> > >>> yes, I guess that is the reason. Imagine a SequenceView
> > >>> implementation
> > >>> that
> > >>> stores indices of the underlying sequence to make its
> > >>> modifications. If
> > >>> the
> > >>> underlying sequence could be modified the indices in the view
> > >>> would
> > >>> become
> > >>> invalid and all views of a Sequence would have to be notified
> > >>> about the change (which would require the implementation of an
> > >>> observer
> > >>> pattern in Sequence, which is currently not present). I guess the
> > >>> need
> > >>> for
> > >>> this logic change was the reason of keeping Sequence
> > >>> implementations
> > >>> atomic.
> > >>> But maybe Andreas could comment on this, because that's just my
> > >>> interpretation
> > >>> of his opinion.
>
> > >>> Although these are really good points, I would anyway agree that
> > >>> having
> > >>> some
> > >>> kind of mutable sequences would be a great thing, because
> > >>> mutating or
> > >>> modifying sequences is a common task and such applications might
> > >>> anyway
> > >>> want/need to rely on a sequence framework, which e.g. checks that
> > >>> only
> > >>> valid
> > >>> tokens are present or offers an implementation that can handle
> > >>> changes in
> > >>> large sequences without having to copy everything to a new
> > >>> object, like
> > >>> it
> > >>> would be the case with simple String objects.
>
> > >>> If other people agree that there is need for that (I would be
> > >>> interested
> > >>> in
> > >>> feedback here) and the community would agree on a way of
> > >>> implementing
> > >>> that
> > >>> (without having the disadvantages mentioned), I would be happy to
> > >>> help
> > >>> creating according code.
>
> > >>> A different EditableSequence interface and a tool class that can
> > >>> converts
> > >>> between Sequence and EditableSequence (without inheriting
> > >>> EditableSequence
> > >>> from Sequence as I initially proposed) might be one option,
> > >>> although this
> > >>> would make Sequence and EditableSequence less compatible. I think
> > >>> this
> > >>> would
> > >>> have to be discussed, but it might really be worth it.
>
> > >>> Best
> > >>> Ben
>
>
> > >>> Paolo Pavan schrieb am 2015-03-30:
> > >>> > Hi Ben and all,
> > >>> > I'm following this thread with interest.
> > >>> > Just to examine in depth, what was the reason of the idea of
> > >>> > mantaining the
> > >>> > sequence atomic? The fact to keep working with the same
> > >>> > instantiated
> > >>> > object
> > >>> > (and hence it's reference) during the software run lifetime?
> > >>> > If is it so, I like the idea that yourself are suggesting to
> > >>> > accomplish the
> > >>> > task of a DNA mutation with a SequenceView.
>
> > >>> > Paolo
>
> > >>> > 2015-03-30 16:36 GMT+02:00 Ben Stöver
> > >>> > <benstoever at uni-muenster.de>:
>
> > >>> > > Hi Jonas,
>
> > >>> > > I have been proposing to inherit a subinterface
> > >>> > > "EditableSequence"
> > >>> > > (with
> > >>> > > according implementations) from the existing Sequence
> > >>> > > interface on
> > >>> > > this
> > >>> > > list
> > >>> > > last November. Some people liked this idea, some did not,
> > >>> > > mainly
> > >>> > > because
> > >>> > > there
> > >>> > > seemed to be concerns that existing code (using BioJava)
> > >>> > > relies on
> > >>> > > the
> > >>> > > assumption of atomic sequences and allowing their
> > >>> > > modification
> > >>> > > might break
> > >>> > > some of this code (at least this was my interpretation of the
> > >>> > > concerns).
> > >>> > > (You
> > >>> > > can have a look at these mails in some archive or I can
> > >>> > > forward
> > >>> > > them to
> > >>> > > you,
> > >>> > > if you want to have a closer look at that discussion.)
>
> > >>> > > To my knowledge it is indeed difficult to modify sequences in
> > >>> > > the
> > >>> > > current
> > >>> > > architecture. The only way I'm aware of, is creating a new
> > >>> > > SequenceView on
> > >>> > > your sequence which provides a modified view on the
> > >>> > > underlying
> > >>> > > sequence
> > >>> > > modeling you mutation. I think there are even some
> > >>> > > implementations
> > >>> > > out
> > >>> > > there
> > >>> > > based on this interface
>
>
> > >>>
> https://github.com/biojava/biojava/blob/master/biojava-core/src/main/java/org/biojava/nbio/core/sequence/edits/Edit.java
> > >>> > > but I never tried them. In my opinion, it is mainly a
> > >>> > > question of
> > >>> > > performance,
> > >>> > > if this approach makes sense for you. (If you e.g. perform
> > >>> > > many
> > >>> > > mutations
> > >>> > > you
> > >>> > > would not want to create a copy of your whole sequence for
> > >>> > > each
> > >>> > > operation
> > >>> > > and
> > >>> > > have a chain of 1000 sequence views in the end.)
>
> > >>> > > Of course you are always free to create or modify an existing
> > >>> > > implementation
> > >>> > > of "Sequence" that offer additional methods for modification,
> > >>> > > but
> > >>> > > keep in
> > >>> > > mind
> > >>> > > that this would break the assumption of "atomic sequence
> > >>> > > objects",
> > >>> > > which
> > >>> > > seems
> > >>> > > to be intended in the current BioJava sequence model.
>
> > >>> > > Anyway, if anyone knows about any other ways to do that in
> > >>> > > BioJava
> > >>> > > or could
> > >>> > > think about a good way of integrating this functionality in
> > >>> > > the
> > >>> > > existing
> > >>> > > architecture (without building up an alternative sequence
> > >>> > > framework), I
> > >>> > > would
> > >>> > > be very interested to know.
>
> > >>> > > Best
> > >>> > > Ben
>
> > >>> > > Dipl. Biologe Ben Stöver
> > >>> > > Evolution und Biodiversity of Plants Group
> > >>> > > Institute for Evolution and Biodiversity
> > >>> > > University of Münster
> > >>> > > Germany
> > >>> > >
> http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/People/Stoever
> > >>> > > BenStoever at uni-muenster.de
>
>
>
> > >>> > > LAW Andy schrieb am 2015-03-30:
> > >>> > > > I think the philosophical view on this is that the mutated
> > >>> > > > sequence
> > >>> > > > is a *new* and *different* sequence.
>
> > >>> > > > On 30 Mar 2015, at 09:30, Jose Manuel Duarte
> > >>> > > > <jose.duarte at psi.ch>
> > >>> > > > wrote:
>
> > >>> > > > > Hi Jonas
>
> > >>> > > > > I'm not very familiar with the sequence part of Biojava,
> > >>> > > > > but
> > >>> > > > > after
> > >>> > > > > looking around a bit it seems that indeed there's no
> > >>> > > > > available
> > >>> > > > > way
> > >>> > > > > to mutate sequences. It looks like people using Biojava
> > >>> > > > > before
> > >>> > > > > had
> > >>> > > > > "read-only" applications in mind. I agree a
> > >>> > > > > setCompoundAt(int
> > >>> > > > > position) would be needed, it should actually be part of
> > >>> > > > > the
> > >>> > > > > Sequence interface. It would be a nice addition for 4.1.
>
> > >>> > > > > Anyway sorry I can't be of more help, perhaps someone
> > >>> > > > > else has
> > >>> > > > > some
> > >>> > > > > more background info on this.
>
> > >>> > > > > Jose
>
>
>
> > >>> > > > > On 28.03.2015 17:13, Jonas Dehairs wrote:
> > >>> > > > >> I want to introduce a mutation to a DNA sequence at a
> > >>> > > > >> particular
> > >>> > > > >> location.
> > >>> > > > >> I can't seem to find a suitable method for this in the
> > >>> > > > >> 4.0
> > >>> > > > >> API.
> > >>> > > > >> What would make most sense to me is a setCompoundAt (int
> > >>> > > > >> position,
> > >>> > > > >> c compound) method in the AbstractSequence class,
> > >>> > > > >> similar to
> > >>> > > > >> the
> > >>> > > > >> getCompoundAt(int position) method, but this doesn't
> > >>> > > > >> seem to
> > >>> > > > >> exist. And the mutator class seems to be for proteins
> > >>> > > > >> only.
> > >>> > > > >> How
> > >>> > > > >> can I do this?
>
>
>
>
> > >>> > > > --
> > >>> > > > The University of Edinburgh is a charitable body,
> > >>> > > > registered in
> > >>> > > > Scotland, with registration number SC005336.
>
>
> > >>> > > > _______________________________________________
> > >>> > > > Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> > >>> > > > http://mailman.open-bio.org/mailman/listinfo/biojava-l
>
> > >>> > > _______________________________________________
> > >>> > > Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> > >>> > > http://mailman.open-bio.org/mailman/listinfo/biojava-l
>
>
> > >>> _______________________________________________
> > >>> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> > >>> http://mailman.open-bio.org/mailman/listinfo/biojava-l
>
>
>
> > >> _______________________________________________
> > >> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> > >> http://mailman.open-bio.org/mailman/listinfo/biojava-l
>
>
>
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> > > http://mailman.open-bio.org/mailman/listinfo/biojava-l
>
>
>
>
> > --
> > -----------------------------------------------------------------------
> > Dr. Andreas Prlic
> > RCSB PDB Protein Data Bank
> > University of California, San Diego
>
> > Editor Software Section
> > PLOS Computational Biology
>
> > BioJava Project Lead
> > -----------------------------------------------------------------------
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biojava-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-l/attachments/20150402/feb3df83/attachment-0003.html>