[Biojava-l] Introducing a mutation in a DNA sequence

Ben Stöver benstoever at uni-muenster.de
Wed Apr 1 16:20:58 UTC 2015


Hi Paolo and all,

yes, I guess that is the reason. Imagine a SequenceView implementation that
stores indices of the underlying sequence to make its modifications. If the
underlying sequence could be modified the indices in the view would become
invalid and all views of a Sequence would have to be notified
about the change (which would require the implementation of an observer
pattern in Sequence, which is currently not present). I guess the need for
this logic change was the reason of keeping Sequence implementations atomic.
But maybe Andreas could comment on this, because that's just my interpretation
of his opinion.

Although these are really good points, I would anyway agree that having some
kind of mutable sequences would be a great thing, because mutating or
modifying sequences is a common task and such applications might anyway
want/need to rely on a sequence framework, which e.g. checks that only valid
tokens are present or offers an implementation that can handle changes in
large sequences without having to copy everything to a new object, like it
would be the case with simple String objects.

If other people agree that there is need for that (I would be interested in
feedback here) and the community would agree on a way of implementing that
(without having the disadvantages mentioned), I would be happy to help
creating according code.

A different EditableSequence interface and a tool class that can converts
between Sequence and EditableSequence (without inheriting EditableSequence
from Sequence as I initially proposed) might be one option, although this
would make Sequence and EditableSequence less compatible. I think this would
have to be discussed, but it might really be worth it.

Best
Ben


Paolo Pavan schrieb am 2015-03-30:
> Hi Ben and all,
> I'm following this thread with interest.
> Just to examine in depth, what was the reason of the idea of
> mantaining the
> sequence atomic? The fact to keep working with the same instantiated
> object
> (and hence it's reference) during the software run lifetime?
> If is it so, I like the idea that yourself are suggesting to
> accomplish the
> task of a DNA mutation with a SequenceView.

> Paolo

> 2015-03-30 16:36 GMT+02:00 Ben Stöver <benstoever at uni-muenster.de>:

> > Hi Jonas,

> > I have been proposing to inherit a subinterface "EditableSequence"
> > (with
> > according implementations) from the existing Sequence interface on
> > this
> > list
> > last November. Some people liked this idea, some did not, mainly
> > because
> > there
> > seemed to be concerns that existing code (using BioJava) relies on
> > the
> > assumption of atomic sequences and allowing their modification
> > might break
> > some of this code (at least this was my interpretation of the
> > concerns).
> > (You
> > can have a look at these mails in some archive or I can forward
> > them to
> > you,
> > if you want to have a closer look at that discussion.)

> > To my knowledge it is indeed difficult to modify sequences in the
> > current
> > architecture. The only way I'm aware of, is creating a new
> > SequenceView on
> > your sequence which provides a modified view on the underlying
> > sequence
> > modeling you mutation. I think there are even some implementations
> > out
> > there
> > based on this interface

> > https://github.com/biojava/biojava/blob/master/biojava-core/src/main/java/org/biojava/nbio/core/sequence/edits/Edit.java
> > but I never tried them. In my opinion, it is mainly a question of
> > performance,
> > if this approach makes sense for you. (If you e.g. perform many
> > mutations
> > you
> > would not want to create a copy of your whole sequence for each
> > operation
> > and
> > have a chain of 1000 sequence views in the end.)

> > Of course you are always free to create or modify an existing
> > implementation
> > of "Sequence" that offer additional methods for modification, but
> > keep in
> > mind
> > that this would break the assumption of "atomic sequence objects",
> > which
> > seems
> > to be intended in the current BioJava sequence model.

> > Anyway, if anyone knows about any other ways to do that in BioJava
> > or could
> > think about a good way of integrating this functionality in the
> > existing
> > architecture (without building up an alternative sequence
> > framework), I
> > would
> > be very interested to know.

> > Best
> > Ben

> > Dipl. Biologe Ben Stöver
> > Evolution und Biodiversity of Plants Group
> > Institute for Evolution and Biodiversity
> > University of Münster
> > Germany
> > http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/People/Stoever
> > BenStoever at uni-muenster.de



> > LAW Andy schrieb am 2015-03-30:
> > > I think the philosophical view on this is that the mutated
> > > sequence
> > > is a *new* and *different* sequence.

> > > On 30 Mar 2015, at 09:30, Jose Manuel Duarte <jose.duarte at psi.ch>
> > > wrote:

> > > > Hi Jonas

> > > > I'm not very familiar with the sequence part of Biojava, but
> > > > after
> > > > looking around a bit it seems that indeed there's no available
> > > > way
> > > > to mutate sequences. It looks like people using Biojava before
> > > > had
> > > > "read-only" applications in mind. I agree a setCompoundAt(int
> > > > position) would be needed, it should actually be part of the
> > > > Sequence interface. It would be a nice addition for 4.1.

> > > > Anyway sorry I can't be of more help, perhaps someone else has
> > > > some
> > > > more background info on this.

> > > > Jose



> > > > On 28.03.2015 17:13, Jonas Dehairs wrote:
> > > >> I want to introduce a mutation to a DNA sequence at a
> > > >> particular
> > > >> location.
> > > >> I can't seem to find a suitable method for this in the 4.0
> > > >> API.
> > > >> What would make most sense to me is a setCompoundAt (int
> > > >> position,
> > > >> c compound) method in the AbstractSequence class, similar to
> > > >> the
> > > >> getCompoundAt(int position) method, but this doesn't seem to
> > > >> exist. And the mutator class seems to be for proteins only.
> > > >> How
> > > >> can I do this?




> > > --
> > > The University of Edinburgh is a charitable body, registered in
> > > Scotland, with registration number SC005336.


> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> > > http://mailman.open-bio.org/mailman/listinfo/biojava-l

> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/biojava-l




More information about the Biojava-l mailing list