<div dir="ltr">Hi,<div><br></div><div>I agree with Ben&#39;s summary. The basic philosophy is that sequences are not mutable.  It is clear that we need some mechanism to introduce mutations in sequences, without having to allocate a copy of the sequence in memory.</div><div><br></div><div>About Mark&#39;s suggestion: I think Paolo&#39;s comment to represent mutations via a &quot;SequenceView&quot; goes in a similar direction.</div><div><br></div><div>I hear two suggestions for how to do this so far:</div><div><br></div><div>A) Mutations via a SequenceView</div><div>B) introduction of an EditableSequence interface.</div><div><br></div><div>Ben: Could you comment a bit further why you would not have an EditableSequence interface extend from Sequence? </div><div><br></div><div>== </div><div><br></div><div>Having said that, currently sequence manipulation is possible via &quot;edits&quot;, however I suspect this is too complicated from an API perspective?</div><div><br></div><div>From EditSequenceTest :</div><div><pre style="color:rgb(0,0,0);font-family:Menlo;font-size:12pt"><span style="color:rgb(0,0,128);font-weight:bold">public void </span>substitute() <span style="color:rgb(0,0,128);font-weight:bold">throws </span>CompoundNotFoundException {<br>  DNASequence seq = <span style="color:rgb(0,0,128);font-weight:bold">new </span>DNASequence(<span style="color:rgb(0,128,0);font-weight:bold">&quot;ACGT&quot;</span>);<br>  assertSeq(<span style="color:rgb(0,0,128);font-weight:bold">new </span>Edit.Substitute&lt;NucleotideCompound&gt;(<span style="color:rgb(0,128,0);font-weight:bold">&quot;T&quot;</span>, <span style="color:rgb(0,0,255)">2</span>).edit(seq), <span style="color:rgb(0,128,0);font-weight:bold">&quot;ATGT&quot;</span>);<br>  assertSeq(<span style="color:rgb(0,0,128);font-weight:bold">new </span>Edit.Substitute&lt;NucleotideCompound&gt;(<span style="color:rgb(0,128,0);font-weight:bold">&quot;TT&quot;</span>, <span style="color:rgb(0,0,255)">2</span>).edit(seq), <span style="color:rgb(0,128,0);font-weight:bold">&quot;ATTT&quot;</span>);<br>  assertSeq(<span style="color:rgb(0,0,128);font-weight:bold">new </span>Edit.Substitute&lt;NucleotideCompound&gt;(<span style="color:rgb(0,128,0);font-weight:bold">&quot;T&quot;</span>, <span style="color:rgb(0,0,255)">1</span>).edit(seq), <span style="color:rgb(0,128,0);font-weight:bold">&quot;TCGT&quot;</span>);<br>  assertSeq(<span style="color:rgb(0,0,128);font-weight:bold">new </span>Edit.Substitute&lt;NucleotideCompound&gt;(<span style="color:rgb(0,128,0);font-weight:bold">&quot;TTC&quot;</span>, <span style="color:rgb(0,0,255)">2</span>).edit(seq), <span style="color:rgb(0,128,0);font-weight:bold">&quot;ATTC&quot;</span>);<br>}</pre>.edit() is using the JoiningSequenceReader under the hood which has a getCompoundAt method.</div><div><br></div><div>Andreas</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 1, 2015 at 3:23 PM, Paolo Pavan <span dir="ltr">&lt;<a href="mailto:paolo.pavan@gmail.com" target="_blank">paolo.pavan@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thank you Mark, I think it should be better to clarify this point, I may have a different idea in my mind.<div><br></div><div>Are we talking about a sequence object that given a &quot;parent&quot; sequence will show the result of applying a set of mutations descriptors?<div>Should this result still be a Sequence object such that it will be possible to apply any further processing that takes a AbstractSequence in input? (e.g.:performing a sequence alignment with SmithWaterman)</div><div>Should this result be the same Sequence object instantiated given in input which, with some mechanism to implement, will show a sequence string different from the original resulting by applying mutation descriptors?</div><div><br></div><div>If it is so, why do not implement it with SequenceView, the same mechanism we get a reverse complemented sequence? </div><div>If this will be accomplished, there will be no need for a new<span style="font-size:23px"> interface EditableSequence and conversion to/from</span> <span style="font-size:23px">Sequence, am I wrong?</span></div></div><div><span style="font-size:23px">Ben, could you better clarify your concerns about not having such a design? Why you still see advantages in a mutable implementation of Sequence instead?</span></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">2015-04-01 19:13 GMT+02:00 Mark Fortner <span dir="ltr">&lt;<a href="mailto:phidias51@gmail.com" target="_blank">phidias51@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Just out of curiosity, could mutations be applied as annotations to a wild-type sequence? The sequence would remain unedited, but you would still be able to represent the mutation and related annotations.  This might work for SNPs, and indels, but I&#39;m not sure how you would deal with chromosomal translocations.<div><br></div><div>Also, would it be useful to be able to reference external variant databases like ClinVar or SwissVar when specifying a mutation?<div class="gmail_extra"><br clear="all"><div><div><div>Regards,</div><div><br></div>Mark<br><br></div></div><div><div>

<br><div class="gmail_quote">On Wed, Apr 1, 2015 at 9:20 AM, Ben Stöver <span dir="ltr">&lt;<a href="mailto:benstoever@uni-muenster.de" target="_blank">benstoever@uni-muenster.de</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Paolo and all,<br>

<br>

yes, I guess that is the reason. Imagine a SequenceView implementation that<br>

stores indices of the underlying sequence to make its modifications. If the<br>

underlying sequence could be modified the indices in the view would become<br>

invalid and all views of a Sequence would have to be notified<br>

about the change (which would require the implementation of an observer<br>

pattern in Sequence, which is currently not present). I guess the need for<br>

this logic change was the reason of keeping Sequence implementations atomic.<br>

But maybe Andreas could comment on this, because that&#39;s just my interpretation<br>

of his opinion.<br>

<br>

Although these are really good points, I would anyway agree that having some<br>

kind of mutable sequences would be a great thing, because mutating or<br>

modifying sequences is a common task and such applications might anyway<br>

want/need to rely on a sequence framework, which e.g. checks that only valid<br>

tokens are present or offers an implementation that can handle changes in<br>

large sequences without having to copy everything to a new object, like it<br>

would be the case with simple String objects.<br>

<br>

If other people agree that there is need for that (I would be interested in<br>

feedback here) and the community would agree on a way of implementing that<br>

(without having the disadvantages mentioned), I would be happy to help<br>

creating according code.<br>

<br>

A different EditableSequence interface and a tool class that can converts<br>

between Sequence and EditableSequence (without inheriting EditableSequence<br>

from Sequence as I initially proposed) might be one option, although this<br>

would make Sequence and EditableSequence less compatible. I think this would<br>

have to be discussed, but it might really be worth it.<br>

<br>

Best<br>

<span><font color="#888888">Ben<br>

</font></span><div><div><br>

<br>

Paolo Pavan schrieb am 2015-03-30:<br>

&gt; Hi Ben and all,<br>

&gt; I&#39;m following this thread with interest.<br>

&gt; Just to examine in depth, what was the reason of the idea of<br>

&gt; mantaining the<br>

&gt; sequence atomic? The fact to keep working with the same instantiated<br>

&gt; object<br>

&gt; (and hence it&#39;s reference) during the software run lifetime?<br>

&gt; If is it so, I like the idea that yourself are suggesting to<br>

&gt; accomplish the<br>

&gt; task of a DNA mutation with a SequenceView.<br>

<br>

&gt; Paolo<br>

<br>

&gt; 2015-03-30 16:36 GMT+02:00 Ben Stöver &lt;<a href="mailto:benstoever@uni-muenster.de" target="_blank">benstoever@uni-muenster.de</a>&gt;:<br>

<br>

&gt; &gt; Hi Jonas,<br>

<br>

&gt; &gt; I have been proposing to inherit a subinterface &quot;EditableSequence&quot;<br>

&gt; &gt; (with<br>

&gt; &gt; according implementations) from the existing Sequence interface on<br>

&gt; &gt; this<br>

&gt; &gt; list<br>

&gt; &gt; last November. Some people liked this idea, some did not, mainly<br>

&gt; &gt; because<br>

&gt; &gt; there<br>

&gt; &gt; seemed to be concerns that existing code (using BioJava) relies on<br>

&gt; &gt; the<br>

&gt; &gt; assumption of atomic sequences and allowing their modification<br>

&gt; &gt; might break<br>

&gt; &gt; some of this code (at least this was my interpretation of the<br>

&gt; &gt; concerns).<br>

&gt; &gt; (You<br>

&gt; &gt; can have a look at these mails in some archive or I can forward<br>

&gt; &gt; them to<br>

&gt; &gt; you,<br>

&gt; &gt; if you want to have a closer look at that discussion.)<br>

<br>

&gt; &gt; To my knowledge it is indeed difficult to modify sequences in the<br>

&gt; &gt; current<br>

&gt; &gt; architecture. The only way I&#39;m aware of, is creating a new<br>

&gt; &gt; SequenceView on<br>

&gt; &gt; your sequence which provides a modified view on the underlying<br>

&gt; &gt; sequence<br>

&gt; &gt; modeling you mutation. I think there are even some implementations<br>

&gt; &gt; out<br>

&gt; &gt; there<br>

&gt; &gt; based on this interface<br>

<br>

&gt; &gt; <a href="https://github.com/biojava/biojava/blob/master/biojava-core/src/main/java/org/biojava/nbio/core/sequence/edits/Edit.java" target="_blank">https://github.com/biojava/biojava/blob/master/biojava-core/src/main/java/org/biojava/nbio/core/sequence/edits/Edit.java</a><br>

&gt; &gt; but I never tried them. In my opinion, it is mainly a question of<br>

&gt; &gt; performance,<br>

&gt; &gt; if this approach makes sense for you. (If you e.g. perform many<br>

&gt; &gt; mutations<br>

&gt; &gt; you<br>

&gt; &gt; would not want to create a copy of your whole sequence for each<br>

&gt; &gt; operation<br>

&gt; &gt; and<br>

&gt; &gt; have a chain of 1000 sequence views in the end.)<br>

<br>

&gt; &gt; Of course you are always free to create or modify an existing<br>

&gt; &gt; implementation<br>

&gt; &gt; of &quot;Sequence&quot; that offer additional methods for modification, but<br>

&gt; &gt; keep in<br>

&gt; &gt; mind<br>

&gt; &gt; that this would break the assumption of &quot;atomic sequence objects&quot;,<br>

&gt; &gt; which<br>

&gt; &gt; seems<br>

&gt; &gt; to be intended in the current BioJava sequence model.<br>

<br>

&gt; &gt; Anyway, if anyone knows about any other ways to do that in BioJava<br>

&gt; &gt; or could<br>

&gt; &gt; think about a good way of integrating this functionality in the<br>

&gt; &gt; existing<br>

&gt; &gt; architecture (without building up an alternative sequence<br>

&gt; &gt; framework), I<br>

&gt; &gt; would<br>

&gt; &gt; be very interested to know.<br>

<br>

&gt; &gt; Best<br>

&gt; &gt; Ben<br>

<br>

&gt; &gt; Dipl. Biologe Ben Stöver<br>

&gt; &gt; Evolution und Biodiversity of Plants Group<br>

&gt; &gt; Institute for Evolution and Biodiversity<br>

&gt; &gt; University of Münster<br>

&gt; &gt; Germany<br>

&gt; &gt; <a href="http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/People/Stoever" target="_blank">http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/People/Stoever</a><br>

&gt; &gt; <a href="mailto:BenStoever@uni-muenster.de" target="_blank">BenStoever@uni-muenster.de</a><br>

<br>

<br>

<br>

&gt; &gt; LAW Andy schrieb am 2015-03-30:<br>

&gt; &gt; &gt; I think the philosophical view on this is that the mutated<br>

&gt; &gt; &gt; sequence<br>

&gt; &gt; &gt; is a *new* and *different* sequence.<br>

<br>

&gt; &gt; &gt; On 30 Mar 2015, at 09:30, Jose Manuel Duarte &lt;<a href="mailto:jose.duarte@psi.ch" target="_blank">jose.duarte@psi.ch</a>&gt;<br>

&gt; &gt; &gt; wrote:<br>

<br>

&gt; &gt; &gt; &gt; Hi Jonas<br>

<br>

&gt; &gt; &gt; &gt; I&#39;m not very familiar with the sequence part of Biojava, but<br>

&gt; &gt; &gt; &gt; after<br>

&gt; &gt; &gt; &gt; looking around a bit it seems that indeed there&#39;s no available<br>

&gt; &gt; &gt; &gt; way<br>

&gt; &gt; &gt; &gt; to mutate sequences. It looks like people using Biojava before<br>

&gt; &gt; &gt; &gt; had<br>

&gt; &gt; &gt; &gt; &quot;read-only&quot; applications in mind. I agree a setCompoundAt(int<br>

&gt; &gt; &gt; &gt; position) would be needed, it should actually be part of the<br>

&gt; &gt; &gt; &gt; Sequence interface. It would be a nice addition for 4.1.<br>

<br>

&gt; &gt; &gt; &gt; Anyway sorry I can&#39;t be of more help, perhaps someone else has<br>

&gt; &gt; &gt; &gt; some<br>

&gt; &gt; &gt; &gt; more background info on this.<br>

<br>

&gt; &gt; &gt; &gt; Jose<br>

<br>

<br>

<br>

&gt; &gt; &gt; &gt; On 28.03.2015 17:13, Jonas Dehairs wrote:<br>

&gt; &gt; &gt; &gt;&gt; I want to introduce a mutation to a DNA sequence at a<br>

&gt; &gt; &gt; &gt;&gt; particular<br>

&gt; &gt; &gt; &gt;&gt; location.<br>

&gt; &gt; &gt; &gt;&gt; I can&#39;t seem to find a suitable method for this in the 4.0<br>

&gt; &gt; &gt; &gt;&gt; API.<br>

&gt; &gt; &gt; &gt;&gt; What would make most sense to me is a setCompoundAt (int<br>

&gt; &gt; &gt; &gt;&gt; position,<br>

&gt; &gt; &gt; &gt;&gt; c compound) method in the AbstractSequence class, similar to<br>

&gt; &gt; &gt; &gt;&gt; the<br>

&gt; &gt; &gt; &gt;&gt; getCompoundAt(int position) method, but this doesn&#39;t seem to<br>

&gt; &gt; &gt; &gt;&gt; exist. And the mutator class seems to be for proteins only.<br>

&gt; &gt; &gt; &gt;&gt; How<br>

&gt; &gt; &gt; &gt;&gt; can I do this?<br>

<br>

<br>

<br>

<br>

&gt; &gt; &gt; --<br>

&gt; &gt; &gt; The University of Edinburgh is a charitable body, registered in<br>

&gt; &gt; &gt; Scotland, with registration number SC005336.<br>

<br>

<br>

&gt; &gt; &gt; _______________________________________________<br>

&gt; &gt; &gt; Biojava-l mailing list  -  <a href="mailto:Biojava-l@mailman.open-bio.org" target="_blank">Biojava-l@mailman.open-bio.org</a><br>

&gt; &gt; &gt; <a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>

<br>

&gt; &gt; _______________________________________________<br>

&gt; &gt; Biojava-l mailing list  -  <a href="mailto:Biojava-l@mailman.open-bio.org" target="_blank">Biojava-l@mailman.open-bio.org</a><br>

&gt; &gt; <a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>

<br>

<br>

_______________________________________________<br>

Biojava-l mailing list  -  <a href="mailto:Biojava-l@mailman.open-bio.org" target="_blank">Biojava-l@mailman.open-bio.org</a><br>

<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>

</div></div></blockquote></div><br></div></div></div></div></div>

<br>_______________________________________________<br>

Biojava-l mailing list  -  <a href="mailto:Biojava-l@mailman.open-bio.org" target="_blank">Biojava-l@mailman.open-bio.org</a><br>

<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br></blockquote></div><br></div>

</div></div><br>_______________________________________________<br>

Biojava-l mailing list  -  <a href="mailto:Biojava-l@mailman.open-bio.org">Biojava-l@mailman.open-bio.org</a><br>

<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr">-----------------------------------------------------------------------<br>Dr. Andreas Prlic<br>RCSB PDB Protein Data Bank<br>University of California, San Diego<div><br></div><div>Editor Software Section <br><div>PLOS Computational Biology<div><div><div><br></div><div>BioJava Project Lead<br>-----------------------------------------------------------------------<br></div></div></div></div></div></div></div>

</div>