[Biojava-l] Off Topic: Solutions to evenly overlapping substring problem??

Richard Holland holland at eaglegenomics.com
Tue Nov 4 10:49:17 UTC 2008


It's a maths problem.

Length of total sequence = L

Number of overlapping sequences required = N

Number of overlaps required = N-1

Length of each overlapping sequence required =S

Offset for each overlapping sequence = length of one non-overlapping
sequence = L/(N-1) = X

Overlap = O = S - X

In your case this gives:

X = 10763 / (10-1)
= 10763  / 9
= 1196 (rounded up)
O = 2500 - 1196
= 1304

So you would start at the beginning, take S bases, then move along X
bases and take the next S, and so on... your first sequence would be
1..2500, your second would be 1197..3697, your third would be
2393..4893, etc. etc., and each one would then overlap the next by
1304.

cheers,
Richard


2008/11/4 Mark Schreiber <markjschreiber at gmail.com>:
> Hi -
>
> Does anyone know how to solve this problem?
>
> I have a piece of DNA which is 10763 bp long. I want to divide this up
> evenly into 10 fragments each of 2500bp in length. What is the overlap
> required between each fragment?
>
> Or more generally, for a sequence of length L, how much overlap O is
> required to generate N fragments of length l (were N and l are fixed)?
>
> A solution would be most appreciated. Extra points for coding it in
> biojava and posting it on the cookbook!!
>
> - Mark
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/



More information about the Biojava-l mailing list