<div dir="ltr">Hi,<div><br></div><div>I agree with Ben's summary. The basic philosophy is that sequences are not mutable. It is clear that we need some mechanism to introduce mutations in sequences, without having to allocate a copy of the sequence in memory.</div><div><br></div><div>About Mark's suggestion: I think Paolo's comment to represent mutations via a "SequenceView" goes in a similar direction.</div><div><br></div><div>I hear two suggestions for how to do this so far:</div><div><br></div><div>A) Mutations via a SequenceView</div><div>B) introduction of an EditableSequence interface.</div><div><br></div><div>Ben: Could you comment a bit further why you would not have an EditableSequence interface extend from Sequence? </div><div><br></div><div>== </div><div><br></div><div>Having said that, currently sequence manipulation is possible via "edits", however I suspect this is too complicated from an API perspective?</div><div><br></div><div>From EditSequenceTest :</div><div><pre style="color:rgb(0,0,0);font-family:Menlo;font-size:12pt"><span style="color:rgb(0,0,128);font-weight:bold">public void </span>substitute() <span style="color:rgb(0,0,128);font-weight:bold">throws </span>CompoundNotFoundException {<br> DNASequence seq = <span style="color:rgb(0,0,128);font-weight:bold">new </span>DNASequence(<span style="color:rgb(0,128,0);font-weight:bold">"ACGT"</span>);<br> assertSeq(<span style="color:rgb(0,0,128);font-weight:bold">new </span>Edit.Substitute<NucleotideCompound>(<span style="color:rgb(0,128,0);font-weight:bold">"T"</span>, <span style="color:rgb(0,0,255)">2</span>).edit(seq), <span style="color:rgb(0,128,0);font-weight:bold">"ATGT"</span>);<br> assertSeq(<span style="color:rgb(0,0,128);font-weight:bold">new </span>Edit.Substitute<NucleotideCompound>(<span style="color:rgb(0,128,0);font-weight:bold">"TT"</span>, <span style="color:rgb(0,0,255)">2</span>).edit(seq), <span style="color:rgb(0,128,0);font-weight:bold">"ATTT"</span>);<br> assertSeq(<span style="color:rgb(0,0,128);font-weight:bold">new </span>Edit.Substitute<NucleotideCompound>(<span style="color:rgb(0,128,0);font-weight:bold">"T"</span>, <span style="color:rgb(0,0,255)">1</span>).edit(seq), <span style="color:rgb(0,128,0);font-weight:bold">"TCGT"</span>);<br> assertSeq(<span style="color:rgb(0,0,128);font-weight:bold">new </span>Edit.Substitute<NucleotideCompound>(<span style="color:rgb(0,128,0);font-weight:bold">"TTC"</span>, <span style="color:rgb(0,0,255)">2</span>).edit(seq), <span style="color:rgb(0,128,0);font-weight:bold">"ATTC"</span>);<br>}</pre>.edit() is using the JoiningSequenceReader under the hood which has a getCompoundAt method.</div><div><br></div><div>Andreas</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 1, 2015 at 3:23 PM, Paolo Pavan <span dir="ltr"><<a href="mailto:paolo.pavan@gmail.com" target="_blank">paolo.pavan@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thank you Mark, I think it should be better to clarify this point, I may have a different idea in my mind.<div><br></div><div>Are we talking about a sequence object that given a "parent" sequence will show the result of applying a set of mutations descriptors?<div>Should this result still be a Sequence object such that it will be possible to apply any further processing that takes a AbstractSequence in input? (e.g.:performing a sequence alignment with SmithWaterman)</div><div>Should this result be the same Sequence object instantiated given in input which, with some mechanism to implement, will show a sequence string different from the original resulting by applying mutation descriptors?</div><div><br></div><div>If it is so, why do not implement it with SequenceView, the same mechanism we get a reverse complemented sequence? </div><div>If this will be accomplished, there will be no need for a new<span style="font-size:23px"> interface EditableSequence and conversion to/from</span> <span style="font-size:23px">Sequence, am I wrong?</span></div></div><div><span style="font-size:23px">Ben, could you better clarify your concerns about not having such a design? Why you still see advantages in a mutable implementation of Sequence instead?</span></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">2015-04-01 19:13 GMT+02:00 Mark Fortner <span dir="ltr"><<a href="mailto:phidias51@gmail.com" target="_blank">phidias51@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Just out of curiosity, could mutations be applied as annotations to a wild-type sequence? The sequence would remain unedited, but you would still be able to represent the mutation and related annotations. This might work for SNPs, and indels, but I'm not sure how you would deal with chromosomal translocations.<div><br></div><div>Also, would it be useful to be able to reference external variant databases like ClinVar or SwissVar when specifying a mutation?<div class="gmail_extra"><br clear="all"><div><div><div>Regards,</div><div><br></div>Mark<br><br></div></div><div><div>
<br><div class="gmail_quote">On Wed, Apr 1, 2015 at 9:20 AM, Ben Stöver <span dir="ltr"><<a href="mailto:benstoever@uni-muenster.de" target="_blank">benstoever@uni-muenster.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Paolo and all,<br>
<br>
yes, I guess that is the reason. Imagine a SequenceView implementation that<br>
stores indices of the underlying sequence to make its modifications. If the<br>
underlying sequence could be modified the indices in the view would become<br>
invalid and all views of a Sequence would have to be notified<br>
about the change (which would require the implementation of an observer<br>
pattern in Sequence, which is currently not present). I guess the need for<br>
this logic change was the reason of keeping Sequence implementations atomic.<br>
But maybe Andreas could comment on this, because that's just my interpretation<br>
of his opinion.<br>
<br>
Although these are really good points, I would anyway agree that having some<br>
kind of mutable sequences would be a great thing, because mutating or<br>
modifying sequences is a common task and such applications might anyway<br>
want/need to rely on a sequence framework, which e.g. checks that only valid<br>
tokens are present or offers an implementation that can handle changes in<br>
large sequences without having to copy everything to a new object, like it<br>
would be the case with simple String objects.<br>
<br>
If other people agree that there is need for that (I would be interested in<br>
feedback here) and the community would agree on a way of implementing that<br>
(without having the disadvantages mentioned), I would be happy to help<br>
creating according code.<br>
<br>
A different EditableSequence interface and a tool class that can converts<br>
between Sequence and EditableSequence (without inheriting EditableSequence<br>
from Sequence as I initially proposed) might be one option, although this<br>
would make Sequence and EditableSequence less compatible. I think this would<br>
have to be discussed, but it might really be worth it.<br>
<br>
Best<br>
<span><font color="#888888">Ben<br>
</font></span><div><div><br>
<br>
Paolo Pavan schrieb am 2015-03-30:<br>
> Hi Ben and all,<br>
> I'm following this thread with interest.<br>
> Just to examine in depth, what was the reason of the idea of<br>
> mantaining the<br>
> sequence atomic? The fact to keep working with the same instantiated<br>
> object<br>
> (and hence it's reference) during the software run lifetime?<br>
> If is it so, I like the idea that yourself are suggesting to<br>
> accomplish the<br>
> task of a DNA mutation with a SequenceView.<br>
<br>
> Paolo<br>
<br>
> 2015-03-30 16:36 GMT+02:00 Ben Stöver <<a href="mailto:benstoever@uni-muenster.de" target="_blank">benstoever@uni-muenster.de</a>>:<br>
<br>
> > Hi Jonas,<br>
<br>
> > I have been proposing to inherit a subinterface "EditableSequence"<br>
> > (with<br>
> > according implementations) from the existing Sequence interface on<br>
> > this<br>
> > list<br>
> > last November. Some people liked this idea, some did not, mainly<br>
> > because<br>
> > there<br>
> > seemed to be concerns that existing code (using BioJava) relies on<br>
> > the<br>
> > assumption of atomic sequences and allowing their modification<br>
> > might break<br>
> > some of this code (at least this was my interpretation of the<br>
> > concerns).<br>
> > (You<br>
> > can have a look at these mails in some archive or I can forward<br>
> > them to<br>
> > you,<br>
> > if you want to have a closer look at that discussion.)<br>
<br>
> > To my knowledge it is indeed difficult to modify sequences in the<br>
> > current<br>
> > architecture. The only way I'm aware of, is creating a new<br>
> > SequenceView on<br>
> > your sequence which provides a modified view on the underlying<br>
> > sequence<br>
> > modeling you mutation. I think there are even some implementations<br>
> > out<br>
> > there<br>
> > based on this interface<br>
<br>
> > <a href="https://github.com/biojava/biojava/blob/master/biojava-core/src/main/java/org/biojava/nbio/core/sequence/edits/Edit.java" target="_blank">https://github.com/biojava/biojava/blob/master/biojava-core/src/main/java/org/biojava/nbio/core/sequence/edits/Edit.java</a><br>
> > but I never tried them. In my opinion, it is mainly a question of<br>
> > performance,<br>
> > if this approach makes sense for you. (If you e.g. perform many<br>
> > mutations<br>
> > you<br>
> > would not want to create a copy of your whole sequence for each<br>
> > operation<br>
> > and<br>
> > have a chain of 1000 sequence views in the end.)<br>
<br>
> > Of course you are always free to create or modify an existing<br>
> > implementation<br>
> > of "Sequence" that offer additional methods for modification, but<br>
> > keep in<br>
> > mind<br>
> > that this would break the assumption of "atomic sequence objects",<br>
> > which<br>
> > seems<br>
> > to be intended in the current BioJava sequence model.<br>
<br>
> > Anyway, if anyone knows about any other ways to do that in BioJava<br>
> > or could<br>
> > think about a good way of integrating this functionality in the<br>
> > existing<br>
> > architecture (without building up an alternative sequence<br>
> > framework), I<br>
> > would<br>
> > be very interested to know.<br>
<br>
> > Best<br>
> > Ben<br>
<br>
> > Dipl. Biologe Ben Stöver<br>
> > Evolution und Biodiversity of Plants Group<br>
> > Institute for Evolution and Biodiversity<br>
> > University of Münster<br>
> > Germany<br>
> > <a href="http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/People/Stoever" target="_blank">http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/People/Stoever</a><br>
> > <a href="mailto:BenStoever@uni-muenster.de" target="_blank">BenStoever@uni-muenster.de</a><br>
<br>
<br>
<br>
> > LAW Andy schrieb am 2015-03-30:<br>
> > > I think the philosophical view on this is that the mutated<br>
> > > sequence<br>
> > > is a *new* and *different* sequence.<br>
<br>
> > > On 30 Mar 2015, at 09:30, Jose Manuel Duarte <<a href="mailto:jose.duarte@psi.ch" target="_blank">jose.duarte@psi.ch</a>><br>
> > > wrote:<br>
<br>
> > > > Hi Jonas<br>
<br>
> > > > I'm not very familiar with the sequence part of Biojava, but<br>
> > > > after<br>
> > > > looking around a bit it seems that indeed there's no available<br>
> > > > way<br>
> > > > to mutate sequences. It looks like people using Biojava before<br>
> > > > had<br>
> > > > "read-only" applications in mind. I agree a setCompoundAt(int<br>
> > > > position) would be needed, it should actually be part of the<br>
> > > > Sequence interface. It would be a nice addition for 4.1.<br>
<br>
> > > > Anyway sorry I can't be of more help, perhaps someone else has<br>
> > > > some<br>
> > > > more background info on this.<br>
<br>
> > > > Jose<br>
<br>
<br>
<br>
> > > > On 28.03.2015 17:13, Jonas Dehairs wrote:<br>
> > > >> I want to introduce a mutation to a DNA sequence at a<br>
> > > >> particular<br>
> > > >> location.<br>
> > > >> I can't seem to find a suitable method for this in the 4.0<br>
> > > >> API.<br>
> > > >> What would make most sense to me is a setCompoundAt (int<br>
> > > >> position,<br>
> > > >> c compound) method in the AbstractSequence class, similar to<br>
> > > >> the<br>
> > > >> getCompoundAt(int position) method, but this doesn't seem to<br>
> > > >> exist. And the mutator class seems to be for proteins only.<br>
> > > >> How<br>
> > > >> can I do this?<br>
<br>
<br>
<br>
<br>
> > > --<br>
> > > The University of Edinburgh is a charitable body, registered in<br>
> > > Scotland, with registration number SC005336.<br>
<br>
<br>
> > > _______________________________________________<br>
> > > Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org" target="_blank">Biojava-l@mailman.open-bio.org</a><br>
> > > <a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>
<br>
> > _______________________________________________<br>
> > Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org" target="_blank">Biojava-l@mailman.open-bio.org</a><br>
> > <a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>
<br>
<br>
_______________________________________________<br>
Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org" target="_blank">Biojava-l@mailman.open-bio.org</a><br>
<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>
</div></div></blockquote></div><br></div></div></div></div></div>
<br>_______________________________________________<br>
Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org" target="_blank">Biojava-l@mailman.open-bio.org</a><br>
<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br></blockquote></div><br></div>
</div></div><br>_______________________________________________<br>
Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org">Biojava-l@mailman.open-bio.org</a><br>
<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr">-----------------------------------------------------------------------<br>Dr. Andreas Prlic<br>RCSB PDB Protein Data Bank<br>University of California, San Diego<div><br></div><div>Editor Software Section <br><div>PLOS Computational Biology<div><div><div><br></div><div>BioJava Project Lead<br>-----------------------------------------------------------------------<br></div></div></div></div></div></div></div>
</div>