[Biojava-dev] Changes to Sequence in BioJava3

Andy Yates ayates at ebi.ac.uk
Tue Nov 2 15:16:47 UTC 2010


Hi everyone,

As a caution to people with implementations already built on the Sequence interface I'm proposing a couple of changes to it. This will cause a binary class incompatibility & will have impacts in the methods you need to implement but I'll sort them out at the BioJava core end.

1). Removal of getSequenceAsString(Integer,Integer,Strand)
** The implementation is patchy & buggy often exposing data from backing stores

2). Addition of SequenceView<C> getReverse()
** Will return the sequence in the reverse strand
** Also complemented if applicable

3). Addition of isComplementable() to CompoundSet
** Used to support the above function

This means substrings of Sequences are retrieved as so:

DNASequence d = new DNASequence("ATGCGC");
d.getSubSequence(2, 5).getSequenceAsString(); //Returns TGCG
d.getSubSequence(2, 5).getReverse().getSequenceAsString(); //Returns CGCT

To support -ve strand indexes you can use the Location objects (the returned Location is expressed in +ve coordinates):

Location l = Location.Tools.location(5, 2, Strand.NEGATIVE, d.getLength());
SequenceView<NucleotideCompound> locationSeq = l.getSubSequence(d);
locationSeq.getSequenceAsString(); //Returns CGCT

Hopefully the implications of these changes will be small & will benefit the code

Andy 

p.s. If you are wondering why I am not proposing a deprecation is because I do not want developers writing quite complex code depending on this functionality. If this was not an alpha release then deprecation would be the only way to go





More information about the biojava-dev mailing list