[Biojava-dev] Using DNASequence reverseComplement

Andy Yates ayates at ebi.ac.uk
Mon May 17 16:07:36 UTC 2010


Hi Trevor,

You've stumbled right into something myself & Scooter are trying to clean-up now. The assumption I had originally made is that all operations on things like getSequenceAsString() would go via the view since that's where the logic is located for both the reversing & the complementing code. That call now delegates onto the backing store and not the view which means you get these very odd results happening.

For the moment I think the following code would do what you're expecting:

seq.getReverseComplement().getSubSequence(1,4).getSequenceAsString()

It's annoying because you're in the reverse coordinate system you've got to reverse the original coordinates so asking for position 11,14 just isn't going to work. The other way of working with this would be to construct the views yourself and pass in a subsequence of the original sequence i.e.

new ReversedSequenceView<NucleotideCompound>(new ComplementSequenceView<NucleotideCompound>(seq.getSubSequence(11,14));

This is really a problem with the erasure of the Sequence types from DNASequence. If DNASequence returned the same type from its subsequence method then you would just call revComp on that and it would have been fine.

The thing to take away from are:

* getSequenceAsString(Integer, Integer, Strand) is not well supported atmo

* So long as we are sure it should remain it will be

* There should be no reason to materialise the Sequence into a String to get a part of the API working. If there is then we've messed up

Andy

p.s. The strand stuff is confusing; originally it was meant to be +ve & -ve strands but assumed that the Sequence you had was always on the +ve strand. Eventually the meaning will come back but will require the methods to be more aware of the strand DNA is on to make the right call about what you want to
do. This all ties in with circular genomes support and locations 

On 17 May 2010, at 16:24, PATERSON Trevor wrote:

> Sorry for raising that behemoth earlier..
> 
> I have a separate problem with the DNASequence API -
> 
> Probably I just don't understand how to use the View objects
> 
> 
> If I make a DNASequence
> 
> DNASequence seq = new DNASequence("AAAAACCCCGGGTT");
> 
> i.e. length = 14,
> 
> I might reasonably want to get the ReverseComplement of bases 11-14, which should 'be' "AACC"
> 
> But I cannot manage to get this in one easy step....
> 
> seq.toString(): AAAAACCCCGGGTT --> FINE
> 
> seq.getReverseComplement().getSequenceAsString(): AACCCGGGGTTTTT --> FINE
> 
> But when I try to use bounds on this complement - methods refer back to the original seq's iterator, not the complement
> 
> seq.getReverseComplement().getSequenceAsString(11,14,Strand.POSITIVE): GGTT 
> 	i.e the same as seq.getSequenceAsString(11,14,Strand.POSITIVE)
> seq.getReverseComplement().getSequenceAsString(11,14,Strand.NEGATIVE): TTGG
> 	i.e the same as seq.getSequenceAsString(11,14,Strand.NEGATIVE)
> 
> Is this the desired behaviour? How would I get the desired reverseComplement fragment?
> 
> The only obvious way that I can see is  
> 
>                DNASequence subseq = new DNASequence(seq.getSequenceAsString(11, 14, Strand.POSITIVE));
>                System.out.println(""+ subseq.getReverseComplement().getSequenceAsString());
> 
> _____________________________________________________________________________________________
> 
> On a related point I was mightily confused by the Strand.POSITIVE/Strand.NEGATIVE enumeration
> 
> I was naively interpreting them to refer to the strand of the DNA: 
> Whereas they infact refer to the directionality of the Iterator *on the same Strand*
> 
> A better name might be Direction:FORWARDS/Direction.BACKWARDS?
> Positive and negative strand has loaded biological meaning for newbies like me ( sense versus antisense )
> So I made the assumption that a Strand.NEGATIVE call would itself reverseComplement
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/








More information about the biojava-dev mailing list