[Biojava-l] Off Topic: Solutions to evenly overlapping substring problem??

Mark Schreiber markjschreiber at gmail.com
Tue Nov 4 12:55:58 UTC 2008


Hi -

This is what I thought as well but if you use that number to generate
the sub strings it doesn't work. The value 1585 works (with one
character left over). I'm not sure how to make that into a
generalizable formula though.

- Mark


On Tue, Nov 4, 2008 at 6:49 PM, Richard Holland
<holland at eaglegenomics.com> wrote:
> It's a maths problem.
>
> Length of total sequence = L
>
> Number of overlapping sequences required = N
>
> Number of overlaps required = N-1
>
> Length of each overlapping sequence required =S
>
> Offset for each overlapping sequence = length of one non-overlapping
> sequence = L/(N-1) = X
>
> Overlap = O = S - X
>
> In your case this gives:
>
> X = 10763 / (10-1)
> = 10763  / 9
> = 1196 (rounded up)
> O = 2500 - 1196
> = 1304
>
> So you would start at the beginning, take S bases, then move along X
> bases and take the next S, and so on... your first sequence would be
> 1..2500, your second would be 1197..3697, your third would be
> 2393..4893, etc. etc., and each one would then overlap the next by
> 1304.
>
> cheers,
> Richard
>
>
> 2008/11/4 Mark Schreiber <markjschreiber at gmail.com>:
>> Hi -
>>
>> Does anyone know how to solve this problem?
>>
>> I have a piece of DNA which is 10763 bp long. I want to divide this up
>> evenly into 10 fragments each of 2500bp in length. What is the overlap
>> required between each fragment?
>>
>> Or more generally, for a sequence of length L, how much overlap O is
>> required to generate N fragments of length l (were N and l are fixed)?
>>
>> A solution would be most appreciated. Extra points for coding it in
>> biojava and posting it on the cookbook!!
>>
>> - Mark
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>



More information about the Biojava-l mailing list