[Biojava-l] Calculating edit distance between 2 DNA Sequences

Hannes Brandstätter-Müller biojava at hannes.oib.com
Tue Nov 15 14:10:43 UTC 2011


Well, I have implemented a first version that is running quite well
for me and my needs/specifications, although I did not integrate it
directly into the biojava class hierarchy yet.
Is anyone interested in taking a look at it and giving me some
feedback if I shoud invest the time and work to make it includable
into biojava?

Hannes

On Wed, Nov 9, 2011 at 10:01, Hannes Brandstätter-Müller
<biojava at hannes.oib.com> wrote:
> Thanks.
>
> I am thinking about implementing a modified MatrixAligner to fit my
> needs here. Direct Levenstein Distance is not exactly right for my
> application, because there are 0 to n insertions/deletions.
>
> A direct LevensteinDistance as implemented in would not give me all
> the Information I need.
>
> Thanks for the hints and input.
>
> Hannes
>
> PS we seriously need more examples in the Cookbook - I'll submit some
> later, but the modified N-W Aligner mentioned below would make a good
> example too, don't you think? ;)
>
> On Mon, Nov 7, 2011 at 16:03,  <forumjspro at gmail.com> wrote:
>> Hi Hannes,
>>
>> You could do such a comparison by using the Needleman-Wunsh aligner with gap penalty set to -1 and the matrix set to -1 for mismatches and 0 for matches. The absolute value of the resulting score is exactly the number of errors.
>>
>> But it will not stop when a maximal number of errors is reached ...
>>
>> JS
>>
>> Le 7 nov. 2011 à 15:42, Andreas Prlic a écrit :
>>
>>> Hi Hannes,
>>>
>>> you are right, this does not exist yet. Somebody else asked the same
>>> question a few weeks ago. As such it would be great if you could
>>> provide a patch, there might be other people interested in that, too.
>>>
>>> Andreas
>>>
>>> On Mon, Nov 7, 2011 at 5:39 AM, Hannes Brandstätter-Müller
>>> <biojava at hannes.oib.com> wrote:
>>>> Following up:
>>>>
>>>> If there is no such thing, should I make it available if I write it?
>>>>
>>>> Hannes
>>>>
>>>> On Thu, Nov 3, 2011 at 14:08, Hannes Brandstätter-Müller
>>>> <biojava at hannes.oib.com> wrote:
>>>>> Hi!
>>>>>
>>>>> Is there a Class/Method in Biojava that calculates the Levenshtein
>>>>> distance between two sequences? I could not find anything in the docs
>>>>> at first search.
>>>>>
>>>>> I need to compare 2 DNASequences (or Strings) and get the number of
>>>>> insertions, deletions, and substitutions. Ideally, there would be an
>>>>> option to abort the comparison if the number of mismatches exceeds a
>>>>> certain number.
>>>>>
>>>>> Hannes
>>>>>
>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>>
>




More information about the Biojava-l mailing list