<div dir="ltr"><div><div><div>Hi!<br></div>Sorry for the giant text in my previous, I don't know what happened.<br></div>Please find a comment below.<br></div>Paolo<br><div><div><div><div><div class="gmail_extra"><br><div class="gmail_quote">2015-04-02 11:02 GMT+02:00 Ben Stöver <span dir="ltr"><<a href="mailto:benstoever@uni-muenster.de" target="_blank">benstoever@uni-muenster.de</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Although this probably works perfectly fine for a lot of tasks, I think it<br>
would have disadvantages when sequentially applying lots of mutations/edits to<br>
a sequence (e.g. in an (GUI based) alignment editor with Sequence objects as<br>
data backend or an application that simulates evolution of sequences along a<br>
large phylogenetic tree by sequentially applying mutations). In such cases the<br>
resulting sequence (containing all mutations/edits) would be a sequence view<br>
on the top a stack of other sequence views (each of them defining one<br>
mutation). Depending on the index, a call of getCompoundAt() would lead to a<br>
trace back of the whole stack in the worst case (if that compound was present<br>
in the initial sequence). (I hope this was more understandable this time?)<br></blockquote><div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
For such applications it would be nice to have a Sequence implementation<br>
(extending interface) or anything similar that is able to really edit the<br>
underlying sequence without creating a stack of views with increasing size. I<br>
have some applications that would benefit from this (e.g.<br>
<a href="http://bioinfweb.info/LibrAlign/" target="_blank">http://bioinfweb.info/LibrAlign/</a> ), but of course I would like to ask the<br>
community how relevant such a feature would be in general.<br></blockquote><div> </div><div>You project looks very interesting! <br>I don't see completely necessary to use a stack of views if mutation objects (let's say to define
Mutation: an Interface to describe a mutation) are stored in a Sequence
object by means of a collection. So every subsequent edit could be added
to the others in the same Sequence object.<br></div><div><br>Moreover, just as an idea, a SequenceView could allow you to retrieve only sequence string with mutations in a specified range, it may speed up the rendering process.<br><br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
@Andreas: Why would I not have an EditableSequence interface extend from<br>
Sequence?<br>
<br>
Generally I would have nothing against this, because all methods from Sequence<br>
could be inherited and implementations of EditableSequence could be passed to<br>
methods which have parameters of type Sequence. (This was my initial idea, I<br>
posted in November.) Of course that would still violate the idea of atomic<br>
sequences a bit, because methods with parameters of type Sequence would have<br>
to check if the passed objects also implement EditableSequence to know that<br>
they cannot assume their contents to be immutable. In that case such methods<br>
could e.g. throw an exception if they cannot handle EditableSequences, but<br>
each method out there would have to implement this behavior.<br>
<br>
Best<br>
<span class=""><font color="#888888">Ben<br>
</font></span><div class=""><div class="h5"><br>
<br>
Andreas Prlic schrieb am 2015-04-02:<br>
> Hi,<br>
<br>
> I agree with Ben's summary. The basic philosophy is that sequences<br>
> are not<br>
> mutable. It is clear that we need some mechanism to introduce<br>
> mutations in<br>
> sequences, without having to allocate a copy of the sequence in<br>
> memory.<br>
<br>
> About Mark's suggestion: I think Paolo's comment to represent<br>
> mutations via<br>
> a "SequenceView" goes in a similar direction.<br>
<br>
> I hear two suggestions for how to do this so far:<br>
<br>
> A) Mutations via a SequenceView<br>
> B) introduction of an EditableSequence interface.<br>
<br>
> Ben: Could you comment a bit further why you would not have an<br>
> EditableSequence interface extend from Sequence?<br>
<br>
> ==<br>
<br>
> Having said that, currently sequence manipulation is possible via<br>
> "edits",<br>
> however I suspect this is too complicated from an API perspective?<br>
<br>
> >From EditSequenceTest :<br>
<br>
> public void substitute() throws CompoundNotFoundException {<br>
> DNASequence seq = new DNASequence("ACGT");<br>
> assertSeq(new Edit.Substitute<NucleotideCompound>("T",<br>
> 2).edit(seq), "ATGT");<br>
> assertSeq(new Edit.Substitute<NucleotideCompound>("TT",<br>
> 2).edit(seq), "ATTT");<br>
> assertSeq(new Edit.Substitute<NucleotideCompound>("T",<br>
> 1).edit(seq), "TCGT");<br>
> assertSeq(new Edit.Substitute<NucleotideCompound>("TTC",<br>
> 2).edit(seq), "ATTC");<br>
> }<br>
<br>
> .edit() is using the JoiningSequenceReader under the hood which has a<br>
> getCompoundAt method.<br>
<br>
> Andreas<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
> On Wed, Apr 1, 2015 at 3:23 PM, Paolo Pavan <<a href="mailto:paolo.pavan@gmail.com">paolo.pavan@gmail.com</a>><br>
> wrote:<br>
<br>
> > Thank you Mark, I think it should be better to clarify this point,<br>
> > I may<br>
> > have a different idea in my mind.<br>
<br>
> > Are we talking about a sequence object that given a "parent"<br>
> > sequence will<br>
> > show the result of applying a set of mutations descriptors?<br>
> > Should this result still be a Sequence object such that it will be<br>
> > possible to apply any further processing that takes a<br>
> > AbstractSequence in<br>
> > input? (e.g.:performing a sequence alignment with SmithWaterman)<br>
> > Should this result be the same Sequence object instantiated given<br>
> > in input<br>
> > which, with some mechanism to implement, will show a sequence<br>
> > string<br>
> > different from the original resulting by applying mutation<br>
> > descriptors?<br>
<br>
> > If it is so, why do not implement it with SequenceView, the same<br>
> > mechanism<br>
> > we get a reverse complemented sequence?<br>
> > If this will be accomplished, there will be no need for a new<br>
> > interface<br>
> > EditableSequence and conversion to/from Sequence, am I wrong?<br>
> > Ben, could you better clarify your concerns about not having such a<br>
> > design? Why you still see advantages in a mutable implementation of<br>
> > Sequence instead?<br>
<br>
> > 2015-04-01 19:13 GMT+02:00 Mark Fortner <<a href="mailto:phidias51@gmail.com">phidias51@gmail.com</a>>:<br>
<br>
> >> Just out of curiosity, could mutations be applied as annotations<br>
> >> to a<br>
> >> wild-type sequence? The sequence would remain unedited, but you<br>
> >> would still<br>
> >> be able to represent the mutation and related annotations. This<br>
> >> might work<br>
> >> for SNPs, and indels, but I'm not sure how you would deal with<br>
> >> chromosomal<br>
> >> translocations.<br>
<br>
> >> Also, would it be useful to be able to reference external variant<br>
> >> databases like ClinVar or SwissVar when specifying a mutation?<br>
<br>
> >> Regards,<br>
<br>
> >> Mark<br>
<br>
<br>
> >> On Wed, Apr 1, 2015 at 9:20 AM, Ben Stöver<br>
> >> <<a href="mailto:benstoever@uni-muenster.de">benstoever@uni-muenster.de</a>><br>
> >> wrote:<br>
<br>
> >>> Hi Paolo and all,<br>
<br>
> >>> yes, I guess that is the reason. Imagine a SequenceView<br>
> >>> implementation<br>
> >>> that<br>
> >>> stores indices of the underlying sequence to make its<br>
> >>> modifications. If<br>
> >>> the<br>
> >>> underlying sequence could be modified the indices in the view<br>
> >>> would<br>
> >>> become<br>
> >>> invalid and all views of a Sequence would have to be notified<br>
> >>> about the change (which would require the implementation of an<br>
> >>> observer<br>
> >>> pattern in Sequence, which is currently not present). I guess the<br>
> >>> need<br>
> >>> for<br>
> >>> this logic change was the reason of keeping Sequence<br>
> >>> implementations<br>
> >>> atomic.<br>
> >>> But maybe Andreas could comment on this, because that's just my<br>
> >>> interpretation<br>
> >>> of his opinion.<br>
<br>
> >>> Although these are really good points, I would anyway agree that<br>
> >>> having<br>
> >>> some<br>
> >>> kind of mutable sequences would be a great thing, because<br>
> >>> mutating or<br>
> >>> modifying sequences is a common task and such applications might<br>
> >>> anyway<br>
> >>> want/need to rely on a sequence framework, which e.g. checks that<br>
> >>> only<br>
> >>> valid<br>
> >>> tokens are present or offers an implementation that can handle<br>
> >>> changes in<br>
> >>> large sequences without having to copy everything to a new<br>
> >>> object, like<br>
> >>> it<br>
> >>> would be the case with simple String objects.<br>
<br>
> >>> If other people agree that there is need for that (I would be<br>
> >>> interested<br>
> >>> in<br>
> >>> feedback here) and the community would agree on a way of<br>
> >>> implementing<br>
> >>> that<br>
> >>> (without having the disadvantages mentioned), I would be happy to<br>
> >>> help<br>
> >>> creating according code.<br>
<br>
> >>> A different EditableSequence interface and a tool class that can<br>
> >>> converts<br>
> >>> between Sequence and EditableSequence (without inheriting<br>
> >>> EditableSequence<br>
> >>> from Sequence as I initially proposed) might be one option,<br>
> >>> although this<br>
> >>> would make Sequence and EditableSequence less compatible. I think<br>
> >>> this<br>
> >>> would<br>
> >>> have to be discussed, but it might really be worth it.<br>
<br>
> >>> Best<br>
> >>> Ben<br>
<br>
<br>
> >>> Paolo Pavan schrieb am 2015-03-30:<br>
> >>> > Hi Ben and all,<br>
> >>> > I'm following this thread with interest.<br>
> >>> > Just to examine in depth, what was the reason of the idea of<br>
> >>> > mantaining the<br>
> >>> > sequence atomic? The fact to keep working with the same<br>
> >>> > instantiated<br>
> >>> > object<br>
> >>> > (and hence it's reference) during the software run lifetime?<br>
> >>> > If is it so, I like the idea that yourself are suggesting to<br>
> >>> > accomplish the<br>
> >>> > task of a DNA mutation with a SequenceView.<br>
<br>
> >>> > Paolo<br>
<br>
> >>> > 2015-03-30 16:36 GMT+02:00 Ben Stöver<br>
> >>> > <<a href="mailto:benstoever@uni-muenster.de">benstoever@uni-muenster.de</a>>:<br>
<br>
> >>> > > Hi Jonas,<br>
<br>
> >>> > > I have been proposing to inherit a subinterface<br>
> >>> > > "EditableSequence"<br>
> >>> > > (with<br>
> >>> > > according implementations) from the existing Sequence<br>
> >>> > > interface on<br>
> >>> > > this<br>
> >>> > > list<br>
> >>> > > last November. Some people liked this idea, some did not,<br>
> >>> > > mainly<br>
> >>> > > because<br>
> >>> > > there<br>
> >>> > > seemed to be concerns that existing code (using BioJava)<br>
> >>> > > relies on<br>
> >>> > > the<br>
> >>> > > assumption of atomic sequences and allowing their<br>
> >>> > > modification<br>
> >>> > > might break<br>
> >>> > > some of this code (at least this was my interpretation of the<br>
> >>> > > concerns).<br>
> >>> > > (You<br>
> >>> > > can have a look at these mails in some archive or I can<br>
> >>> > > forward<br>
> >>> > > them to<br>
> >>> > > you,<br>
> >>> > > if you want to have a closer look at that discussion.)<br>
<br>
> >>> > > To my knowledge it is indeed difficult to modify sequences in<br>
> >>> > > the<br>
> >>> > > current<br>
> >>> > > architecture. The only way I'm aware of, is creating a new<br>
> >>> > > SequenceView on<br>
> >>> > > your sequence which provides a modified view on the<br>
> >>> > > underlying<br>
> >>> > > sequence<br>
> >>> > > modeling you mutation. I think there are even some<br>
> >>> > > implementations<br>
> >>> > > out<br>
> >>> > > there<br>
> >>> > > based on this interface<br>
<br>
<br>
> >>> <a href="https://github.com/biojava/biojava/blob/master/biojava-core/src/main/java/org/biojava/nbio/core/sequence/edits/Edit.java" target="_blank">https://github.com/biojava/biojava/blob/master/biojava-core/src/main/java/org/biojava/nbio/core/sequence/edits/Edit.java</a><br>
> >>> > > but I never tried them. In my opinion, it is mainly a<br>
> >>> > > question of<br>
> >>> > > performance,<br>
> >>> > > if this approach makes sense for you. (If you e.g. perform<br>
> >>> > > many<br>
> >>> > > mutations<br>
> >>> > > you<br>
> >>> > > would not want to create a copy of your whole sequence for<br>
> >>> > > each<br>
> >>> > > operation<br>
> >>> > > and<br>
> >>> > > have a chain of 1000 sequence views in the end.)<br>
<br>
> >>> > > Of course you are always free to create or modify an existing<br>
> >>> > > implementation<br>
> >>> > > of "Sequence" that offer additional methods for modification,<br>
> >>> > > but<br>
> >>> > > keep in<br>
> >>> > > mind<br>
> >>> > > that this would break the assumption of "atomic sequence<br>
> >>> > > objects",<br>
> >>> > > which<br>
> >>> > > seems<br>
> >>> > > to be intended in the current BioJava sequence model.<br>
<br>
> >>> > > Anyway, if anyone knows about any other ways to do that in<br>
> >>> > > BioJava<br>
> >>> > > or could<br>
> >>> > > think about a good way of integrating this functionality in<br>
> >>> > > the<br>
> >>> > > existing<br>
> >>> > > architecture (without building up an alternative sequence<br>
> >>> > > framework), I<br>
> >>> > > would<br>
> >>> > > be very interested to know.<br>
<br>
> >>> > > Best<br>
> >>> > > Ben<br>
<br>
> >>> > > Dipl. Biologe Ben Stöver<br>
> >>> > > Evolution und Biodiversity of Plants Group<br>
> >>> > > Institute for Evolution and Biodiversity<br>
> >>> > > University of Münster<br>
> >>> > > Germany<br>
> >>> > > <a href="http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/People/Stoever" target="_blank">http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/People/Stoever</a><br>
> >>> > > <a href="mailto:BenStoever@uni-muenster.de">BenStoever@uni-muenster.de</a><br>
<br>
<br>
<br>
> >>> > > LAW Andy schrieb am 2015-03-30:<br>
> >>> > > > I think the philosophical view on this is that the mutated<br>
> >>> > > > sequence<br>
> >>> > > > is a *new* and *different* sequence.<br>
<br>
> >>> > > > On 30 Mar 2015, at 09:30, Jose Manuel Duarte<br>
> >>> > > > <<a href="mailto:jose.duarte@psi.ch">jose.duarte@psi.ch</a>><br>
> >>> > > > wrote:<br>
<br>
> >>> > > > > Hi Jonas<br>
<br>
> >>> > > > > I'm not very familiar with the sequence part of Biojava,<br>
> >>> > > > > but<br>
> >>> > > > > after<br>
> >>> > > > > looking around a bit it seems that indeed there's no<br>
> >>> > > > > available<br>
> >>> > > > > way<br>
> >>> > > > > to mutate sequences. It looks like people using Biojava<br>
> >>> > > > > before<br>
> >>> > > > > had<br>
> >>> > > > > "read-only" applications in mind. I agree a<br>
> >>> > > > > setCompoundAt(int<br>
> >>> > > > > position) would be needed, it should actually be part of<br>
> >>> > > > > the<br>
> >>> > > > > Sequence interface. It would be a nice addition for 4.1.<br>
<br>
> >>> > > > > Anyway sorry I can't be of more help, perhaps someone<br>
> >>> > > > > else has<br>
> >>> > > > > some<br>
> >>> > > > > more background info on this.<br>
<br>
> >>> > > > > Jose<br>
<br>
<br>
<br>
> >>> > > > > On 28.03.2015 17:13, Jonas Dehairs wrote:<br>
> >>> > > > >> I want to introduce a mutation to a DNA sequence at a<br>
> >>> > > > >> particular<br>
> >>> > > > >> location.<br>
> >>> > > > >> I can't seem to find a suitable method for this in the<br>
> >>> > > > >> 4.0<br>
> >>> > > > >> API.<br>
> >>> > > > >> What would make most sense to me is a setCompoundAt (int<br>
> >>> > > > >> position,<br>
> >>> > > > >> c compound) method in the AbstractSequence class,<br>
> >>> > > > >> similar to<br>
> >>> > > > >> the<br>
> >>> > > > >> getCompoundAt(int position) method, but this doesn't<br>
> >>> > > > >> seem to<br>
> >>> > > > >> exist. And the mutator class seems to be for proteins<br>
> >>> > > > >> only.<br>
> >>> > > > >> How<br>
> >>> > > > >> can I do this?<br>
<br>
<br>
<br>
<br>
> >>> > > > --<br>
> >>> > > > The University of Edinburgh is a charitable body,<br>
> >>> > > > registered in<br>
> >>> > > > Scotland, with registration number SC005336.<br>
<br>
<br>
> >>> > > > _______________________________________________<br>
> >>> > > > Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org">Biojava-l@mailman.open-bio.org</a><br>
> >>> > > > <a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>
<br>
> >>> > > _______________________________________________<br>
> >>> > > Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org">Biojava-l@mailman.open-bio.org</a><br>
> >>> > > <a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>
<br>
<br>
> >>> _______________________________________________<br>
> >>> Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org">Biojava-l@mailman.open-bio.org</a><br>
> >>> <a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>
<br>
<br>
<br>
> >> _______________________________________________<br>
> >> Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org">Biojava-l@mailman.open-bio.org</a><br>
> >> <a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>
<br>
<br>
<br>
> > _______________________________________________<br>
> > Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org">Biojava-l@mailman.open-bio.org</a><br>
> > <a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>
<br>
<br>
<br>
<br>
> --<br>
> -----------------------------------------------------------------------<br>
> Dr. Andreas Prlic<br>
> RCSB PDB Protein Data Bank<br>
> University of California, San Diego<br>
<br>
> Editor Software Section<br>
> PLOS Computational Biology<br>
<br>
> BioJava Project Lead<br>
> -----------------------------------------------------------------------<br>
<br>
_______________________________________________<br>
Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org">Biojava-l@mailman.open-bio.org</a><br>
<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>
</div></div></blockquote></div><br></div></div></div></div></div></div>