[Biojava-l] How to find a sequence within a larger sequence and flip it

Doug Swisher big.swish at gmail.com
Fri Sep 19 03:37:59 UTC 2008


Hi,

I'm pretty new to BioJava, and I'm a bit stuck.  I'm hoping someone can help
out a bit...even if it's just a hint as to where to look next.

I have a long DNA sequence and a shorter sequence that exists within the
larger one.  I want to find the location of the smaller sequence within the
larger one, and then create a new sequence with the small one flipped
end-for-end.  That's confusing, so let me give an example.

Long sequence: aaaagacttttt
Short sequence: gact
Goal sequence: aaaatcagtttt

To find the location of the short sequence within the larger one, I could
certainly do some string manipulation:

    SymbolList bigDNA = DNATools.createDNA("aaaagacttttt");
    SymbolList subDNA = DNATools.createDNA("gact");
    int start = bigDNA.seqString().indexOf(subDNA.seqString());

While that would work, I'm wondering if there is a more efficient method
that avoids the conversion to strings (in my real code, I start with
Sequences, not strings; I used SymbolLists here for simplicity).

To "excise" the short sequence, flip it around, and construct a new
SymbolList, I could also do some string manipulation, as in the following:

    StringBuilder middle = new StringBuilder(subDNA.seqString());
    String leftPart = bigDNA.seqString().substring(0, subDNA.length());
    String rightPart = bigDNA.seqString().substring(start + subDNA.length(),
bigDNA.length());
    SymbolList goalDNA = DNATools.createDNA(leftPart + middle.reverse() +
rightPart);

Looking at the documentation, such as ProjectionUtils or SymbolList.edit(),
it appears there might be some support for manipulating the sequence
directly.  Is there a way to do it, without again dropping "down" to
strings?

Thanks in advance for any assistance.

Cheers,
-Doug

P.S. Yeah, the second code snippet is pretty inefficient; I was trying to be
clear rather than efficient.



More information about the Biojava-l mailing list