[Biojava-l] Introducing a mutation in a DNA sequence

Andreas Prlic andreas at sdsc.edu
Thu Apr 2 04:34:10 UTC 2015


Hi,

I agree with Ben's summary. The basic philosophy is that sequences are not
mutable.  It is clear that we need some mechanism to introduce mutations in
sequences, without having to allocate a copy of the sequence in memory.

About Mark's suggestion: I think Paolo's comment to represent mutations via
a "SequenceView" goes in a similar direction.

I hear two suggestions for how to do this so far:

A) Mutations via a SequenceView
B) introduction of an EditableSequence interface.

Ben: Could you comment a bit further why you would not have an
EditableSequence interface extend from Sequence?

==

Having said that, currently sequence manipulation is possible via "edits",
however I suspect this is too complicated from an API perspective?

>From EditSequenceTest :

public void substitute() throws CompoundNotFoundException {
  DNASequence seq = new DNASequence("ACGT");
  assertSeq(new Edit.Substitute<NucleotideCompound>("T", 2).edit(seq), "ATGT");
  assertSeq(new Edit.Substitute<NucleotideCompound>("TT", 2).edit(seq), "ATTT");
  assertSeq(new Edit.Substitute<NucleotideCompound>("T", 1).edit(seq), "TCGT");
  assertSeq(new Edit.Substitute<NucleotideCompound>("TTC",
2).edit(seq), "ATTC");
}

.edit() is using the JoiningSequenceReader under the hood which has a
getCompoundAt method.

Andreas







On Wed, Apr 1, 2015 at 3:23 PM, Paolo Pavan <paolo.pavan at gmail.com> wrote:

> Thank you Mark, I think it should be better to clarify this point, I may
> have a different idea in my mind.
>
> Are we talking about a sequence object that given a "parent" sequence will
> show the result of applying a set of mutations descriptors?
> Should this result still be a Sequence object such that it will be
> possible to apply any further processing that takes a AbstractSequence in
> input? (e.g.:performing a sequence alignment with SmithWaterman)
> Should this result be the same Sequence object instantiated given in input
> which, with some mechanism to implement, will show a sequence string
> different from the original resulting by applying mutation descriptors?
>
> If it is so, why do not implement it with SequenceView, the same mechanism
> we get a reverse complemented sequence?
> If this will be accomplished, there will be no need for a new interface
> EditableSequence and conversion to/from Sequence, am I wrong?
> Ben, could you better clarify your concerns about not having such a
> design? Why you still see advantages in a mutable implementation of
> Sequence instead?
>
> 2015-04-01 19:13 GMT+02:00 Mark Fortner <phidias51 at gmail.com>:
>
>> Just out of curiosity, could mutations be applied as annotations to a
>> wild-type sequence? The sequence would remain unedited, but you would still
>> be able to represent the mutation and related annotations.  This might work
>> for SNPs, and indels, but I'm not sure how you would deal with chromosomal
>> translocations.
>>
>> Also, would it be useful to be able to reference external variant
>> databases like ClinVar or SwissVar when specifying a mutation?
>>
>> Regards,
>>
>> Mark
>>
>>
>> On Wed, Apr 1, 2015 at 9:20 AM, Ben Stöver <benstoever at uni-muenster.de>
>> wrote:
>>
>>> Hi Paolo and all,
>>>
>>> yes, I guess that is the reason. Imagine a SequenceView implementation
>>> that
>>> stores indices of the underlying sequence to make its modifications. If
>>> the
>>> underlying sequence could be modified the indices in the view would
>>> become
>>> invalid and all views of a Sequence would have to be notified
>>> about the change (which would require the implementation of an observer
>>> pattern in Sequence, which is currently not present). I guess the need
>>> for
>>> this logic change was the reason of keeping Sequence implementations
>>> atomic.
>>> But maybe Andreas could comment on this, because that's just my
>>> interpretation
>>> of his opinion.
>>>
>>> Although these are really good points, I would anyway agree that having
>>> some
>>> kind of mutable sequences would be a great thing, because mutating or
>>> modifying sequences is a common task and such applications might anyway
>>> want/need to rely on a sequence framework, which e.g. checks that only
>>> valid
>>> tokens are present or offers an implementation that can handle changes in
>>> large sequences without having to copy everything to a new object, like
>>> it
>>> would be the case with simple String objects.
>>>
>>> If other people agree that there is need for that (I would be interested
>>> in
>>> feedback here) and the community would agree on a way of implementing
>>> that
>>> (without having the disadvantages mentioned), I would be happy to help
>>> creating according code.
>>>
>>> A different EditableSequence interface and a tool class that can converts
>>> between Sequence and EditableSequence (without inheriting
>>> EditableSequence
>>> from Sequence as I initially proposed) might be one option, although this
>>> would make Sequence and EditableSequence less compatible. I think this
>>> would
>>> have to be discussed, but it might really be worth it.
>>>
>>> Best
>>> Ben
>>>
>>>
>>> Paolo Pavan schrieb am 2015-03-30:
>>> > Hi Ben and all,
>>> > I'm following this thread with interest.
>>> > Just to examine in depth, what was the reason of the idea of
>>> > mantaining the
>>> > sequence atomic? The fact to keep working with the same instantiated
>>> > object
>>> > (and hence it's reference) during the software run lifetime?
>>> > If is it so, I like the idea that yourself are suggesting to
>>> > accomplish the
>>> > task of a DNA mutation with a SequenceView.
>>>
>>> > Paolo
>>>
>>> > 2015-03-30 16:36 GMT+02:00 Ben Stöver <benstoever at uni-muenster.de>:
>>>
>>> > > Hi Jonas,
>>>
>>> > > I have been proposing to inherit a subinterface "EditableSequence"
>>> > > (with
>>> > > according implementations) from the existing Sequence interface on
>>> > > this
>>> > > list
>>> > > last November. Some people liked this idea, some did not, mainly
>>> > > because
>>> > > there
>>> > > seemed to be concerns that existing code (using BioJava) relies on
>>> > > the
>>> > > assumption of atomic sequences and allowing their modification
>>> > > might break
>>> > > some of this code (at least this was my interpretation of the
>>> > > concerns).
>>> > > (You
>>> > > can have a look at these mails in some archive or I can forward
>>> > > them to
>>> > > you,
>>> > > if you want to have a closer look at that discussion.)
>>>
>>> > > To my knowledge it is indeed difficult to modify sequences in the
>>> > > current
>>> > > architecture. The only way I'm aware of, is creating a new
>>> > > SequenceView on
>>> > > your sequence which provides a modified view on the underlying
>>> > > sequence
>>> > > modeling you mutation. I think there are even some implementations
>>> > > out
>>> > > there
>>> > > based on this interface
>>>
>>> > >
>>> https://github.com/biojava/biojava/blob/master/biojava-core/src/main/java/org/biojava/nbio/core/sequence/edits/Edit.java
>>> > > but I never tried them. In my opinion, it is mainly a question of
>>> > > performance,
>>> > > if this approach makes sense for you. (If you e.g. perform many
>>> > > mutations
>>> > > you
>>> > > would not want to create a copy of your whole sequence for each
>>> > > operation
>>> > > and
>>> > > have a chain of 1000 sequence views in the end.)
>>>
>>> > > Of course you are always free to create or modify an existing
>>> > > implementation
>>> > > of "Sequence" that offer additional methods for modification, but
>>> > > keep in
>>> > > mind
>>> > > that this would break the assumption of "atomic sequence objects",
>>> > > which
>>> > > seems
>>> > > to be intended in the current BioJava sequence model.
>>>
>>> > > Anyway, if anyone knows about any other ways to do that in BioJava
>>> > > or could
>>> > > think about a good way of integrating this functionality in the
>>> > > existing
>>> > > architecture (without building up an alternative sequence
>>> > > framework), I
>>> > > would
>>> > > be very interested to know.
>>>
>>> > > Best
>>> > > Ben
>>>
>>> > > Dipl. Biologe Ben Stöver
>>> > > Evolution und Biodiversity of Plants Group
>>> > > Institute for Evolution and Biodiversity
>>> > > University of Münster
>>> > > Germany
>>> > > http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/People/Stoever
>>> > > BenStoever at uni-muenster.de
>>>
>>>
>>>
>>> > > LAW Andy schrieb am 2015-03-30:
>>> > > > I think the philosophical view on this is that the mutated
>>> > > > sequence
>>> > > > is a *new* and *different* sequence.
>>>
>>> > > > On 30 Mar 2015, at 09:30, Jose Manuel Duarte <jose.duarte at psi.ch>
>>> > > > wrote:
>>>
>>> > > > > Hi Jonas
>>>
>>> > > > > I'm not very familiar with the sequence part of Biojava, but
>>> > > > > after
>>> > > > > looking around a bit it seems that indeed there's no available
>>> > > > > way
>>> > > > > to mutate sequences. It looks like people using Biojava before
>>> > > > > had
>>> > > > > "read-only" applications in mind. I agree a setCompoundAt(int
>>> > > > > position) would be needed, it should actually be part of the
>>> > > > > Sequence interface. It would be a nice addition for 4.1.
>>>
>>> > > > > Anyway sorry I can't be of more help, perhaps someone else has
>>> > > > > some
>>> > > > > more background info on this.
>>>
>>> > > > > Jose
>>>
>>>
>>>
>>> > > > > On 28.03.2015 17:13, Jonas Dehairs wrote:
>>> > > > >> I want to introduce a mutation to a DNA sequence at a
>>> > > > >> particular
>>> > > > >> location.
>>> > > > >> I can't seem to find a suitable method for this in the 4.0
>>> > > > >> API.
>>> > > > >> What would make most sense to me is a setCompoundAt (int
>>> > > > >> position,
>>> > > > >> c compound) method in the AbstractSequence class, similar to
>>> > > > >> the
>>> > > > >> getCompoundAt(int position) method, but this doesn't seem to
>>> > > > >> exist. And the mutator class seems to be for proteins only.
>>> > > > >> How
>>> > > > >> can I do this?
>>>
>>>
>>>
>>>
>>> > > > --
>>> > > > The University of Edinburgh is a charitable body, registered in
>>> > > > Scotland, with registration number SC005336.
>>>
>>>
>>> > > > _______________________________________________
>>> > > > Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
>>> > > > http://mailman.open-bio.org/mailman/listinfo/biojava-l
>>>
>>> > > _______________________________________________
>>> > > Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
>>> > > http://mailman.open-bio.org/mailman/listinfo/biojava-l
>>>
>>>
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biojava-l
>>
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biojava-l
>



-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
RCSB PDB Protein Data Bank
University of California, San Diego

Editor Software Section
PLOS Computational Biology

BioJava Project Lead
-----------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-l/attachments/20150401/978ee3de/attachment-0001.html>


More information about the Biojava-l mailing list