[Biojava-l] Introducing a mutation in a DNA sequence

Paolo Pavan paolo.pavan at gmail.com
Wed Apr 1 22:23:41 UTC 2015


Thank you Mark, I think it should be better to clarify this point, I may
have a different idea in my mind.

Are we talking about a sequence object that given a "parent" sequence will
show the result of applying a set of mutations descriptors?
Should this result still be a Sequence object such that it will be possible
to apply any further processing that takes a AbstractSequence in input?
(e.g.:performing a sequence alignment with SmithWaterman)
Should this result be the same Sequence object instantiated given in input
which, with some mechanism to implement, will show a sequence string
different from the original resulting by applying mutation descriptors?

If it is so, why do not implement it with SequenceView, the same mechanism
we get a reverse complemented sequence?
If this will be accomplished, there will be no need for a new interface
EditableSequence and conversion to/from Sequence, am I wrong?
Ben, could you better clarify your concerns about not having such a design?
Why you still see advantages in a mutable implementation of Sequence
instead?

2015-04-01 19:13 GMT+02:00 Mark Fortner <phidias51 at gmail.com>:

> Just out of curiosity, could mutations be applied as annotations to a
> wild-type sequence? The sequence would remain unedited, but you would still
> be able to represent the mutation and related annotations.  This might work
> for SNPs, and indels, but I'm not sure how you would deal with chromosomal
> translocations.
>
> Also, would it be useful to be able to reference external variant
> databases like ClinVar or SwissVar when specifying a mutation?
>
> Regards,
>
> Mark
>
>
> On Wed, Apr 1, 2015 at 9:20 AM, Ben Stöver <benstoever at uni-muenster.de>
> wrote:
>
>> Hi Paolo and all,
>>
>> yes, I guess that is the reason. Imagine a SequenceView implementation
>> that
>> stores indices of the underlying sequence to make its modifications. If
>> the
>> underlying sequence could be modified the indices in the view would become
>> invalid and all views of a Sequence would have to be notified
>> about the change (which would require the implementation of an observer
>> pattern in Sequence, which is currently not present). I guess the need for
>> this logic change was the reason of keeping Sequence implementations
>> atomic.
>> But maybe Andreas could comment on this, because that's just my
>> interpretation
>> of his opinion.
>>
>> Although these are really good points, I would anyway agree that having
>> some
>> kind of mutable sequences would be a great thing, because mutating or
>> modifying sequences is a common task and such applications might anyway
>> want/need to rely on a sequence framework, which e.g. checks that only
>> valid
>> tokens are present or offers an implementation that can handle changes in
>> large sequences without having to copy everything to a new object, like it
>> would be the case with simple String objects.
>>
>> If other people agree that there is need for that (I would be interested
>> in
>> feedback here) and the community would agree on a way of implementing that
>> (without having the disadvantages mentioned), I would be happy to help
>> creating according code.
>>
>> A different EditableSequence interface and a tool class that can converts
>> between Sequence and EditableSequence (without inheriting EditableSequence
>> from Sequence as I initially proposed) might be one option, although this
>> would make Sequence and EditableSequence less compatible. I think this
>> would
>> have to be discussed, but it might really be worth it.
>>
>> Best
>> Ben
>>
>>
>> Paolo Pavan schrieb am 2015-03-30:
>> > Hi Ben and all,
>> > I'm following this thread with interest.
>> > Just to examine in depth, what was the reason of the idea of
>> > mantaining the
>> > sequence atomic? The fact to keep working with the same instantiated
>> > object
>> > (and hence it's reference) during the software run lifetime?
>> > If is it so, I like the idea that yourself are suggesting to
>> > accomplish the
>> > task of a DNA mutation with a SequenceView.
>>
>> > Paolo
>>
>> > 2015-03-30 16:36 GMT+02:00 Ben Stöver <benstoever at uni-muenster.de>:
>>
>> > > Hi Jonas,
>>
>> > > I have been proposing to inherit a subinterface "EditableSequence"
>> > > (with
>> > > according implementations) from the existing Sequence interface on
>> > > this
>> > > list
>> > > last November. Some people liked this idea, some did not, mainly
>> > > because
>> > > there
>> > > seemed to be concerns that existing code (using BioJava) relies on
>> > > the
>> > > assumption of atomic sequences and allowing their modification
>> > > might break
>> > > some of this code (at least this was my interpretation of the
>> > > concerns).
>> > > (You
>> > > can have a look at these mails in some archive or I can forward
>> > > them to
>> > > you,
>> > > if you want to have a closer look at that discussion.)
>>
>> > > To my knowledge it is indeed difficult to modify sequences in the
>> > > current
>> > > architecture. The only way I'm aware of, is creating a new
>> > > SequenceView on
>> > > your sequence which provides a modified view on the underlying
>> > > sequence
>> > > modeling you mutation. I think there are even some implementations
>> > > out
>> > > there
>> > > based on this interface
>>
>> > >
>> https://github.com/biojava/biojava/blob/master/biojava-core/src/main/java/org/biojava/nbio/core/sequence/edits/Edit.java
>> > > but I never tried them. In my opinion, it is mainly a question of
>> > > performance,
>> > > if this approach makes sense for you. (If you e.g. perform many
>> > > mutations
>> > > you
>> > > would not want to create a copy of your whole sequence for each
>> > > operation
>> > > and
>> > > have a chain of 1000 sequence views in the end.)
>>
>> > > Of course you are always free to create or modify an existing
>> > > implementation
>> > > of "Sequence" that offer additional methods for modification, but
>> > > keep in
>> > > mind
>> > > that this would break the assumption of "atomic sequence objects",
>> > > which
>> > > seems
>> > > to be intended in the current BioJava sequence model.
>>
>> > > Anyway, if anyone knows about any other ways to do that in BioJava
>> > > or could
>> > > think about a good way of integrating this functionality in the
>> > > existing
>> > > architecture (without building up an alternative sequence
>> > > framework), I
>> > > would
>> > > be very interested to know.
>>
>> > > Best
>> > > Ben
>>
>> > > Dipl. Biologe Ben Stöver
>> > > Evolution und Biodiversity of Plants Group
>> > > Institute for Evolution and Biodiversity
>> > > University of Münster
>> > > Germany
>> > > http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/People/Stoever
>> > > BenStoever at uni-muenster.de
>>
>>
>>
>> > > LAW Andy schrieb am 2015-03-30:
>> > > > I think the philosophical view on this is that the mutated
>> > > > sequence
>> > > > is a *new* and *different* sequence.
>>
>> > > > On 30 Mar 2015, at 09:30, Jose Manuel Duarte <jose.duarte at psi.ch>
>> > > > wrote:
>>
>> > > > > Hi Jonas
>>
>> > > > > I'm not very familiar with the sequence part of Biojava, but
>> > > > > after
>> > > > > looking around a bit it seems that indeed there's no available
>> > > > > way
>> > > > > to mutate sequences. It looks like people using Biojava before
>> > > > > had
>> > > > > "read-only" applications in mind. I agree a setCompoundAt(int
>> > > > > position) would be needed, it should actually be part of the
>> > > > > Sequence interface. It would be a nice addition for 4.1.
>>
>> > > > > Anyway sorry I can't be of more help, perhaps someone else has
>> > > > > some
>> > > > > more background info on this.
>>
>> > > > > Jose
>>
>>
>>
>> > > > > On 28.03.2015 17:13, Jonas Dehairs wrote:
>> > > > >> I want to introduce a mutation to a DNA sequence at a
>> > > > >> particular
>> > > > >> location.
>> > > > >> I can't seem to find a suitable method for this in the 4.0
>> > > > >> API.
>> > > > >> What would make most sense to me is a setCompoundAt (int
>> > > > >> position,
>> > > > >> c compound) method in the AbstractSequence class, similar to
>> > > > >> the
>> > > > >> getCompoundAt(int position) method, but this doesn't seem to
>> > > > >> exist. And the mutator class seems to be for proteins only.
>> > > > >> How
>> > > > >> can I do this?
>>
>>
>>
>>
>> > > > --
>> > > > The University of Edinburgh is a charitable body, registered in
>> > > > Scotland, with registration number SC005336.
>>
>>
>> > > > _______________________________________________
>> > > > Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
>> > > > http://mailman.open-bio.org/mailman/listinfo/biojava-l
>>
>> > > _______________________________________________
>> > > Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
>> > > http://mailman.open-bio.org/mailman/listinfo/biojava-l
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biojava-l
>>
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biojava-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-l/attachments/20150402/eafb434c/attachment-0001.html>


More information about the Biojava-l mailing list