[Biojava-dev] Changes to Sequence in BioJava3

Andy Yates ayates at ebi.ac.uk
Fri Nov 5 16:24:11 UTC 2010


Hi Trevor,

I am sorry that these have caused you a lot of issues but I do hope that this will clear up the problems with using the interfaces.

So point number one. If you want a "dirty" way of doing it then try out SequenceMixin.toString() which uses the iterator linked to the Sequence to build a String. If you want I'd be happy to have a quick squint at the code & suggest/contribute a patch to get it working after all I do straddle the BioJava/EnsemblGenomes divide.

So as for your second point. SequenceView inherits from Sequence so all SequneceView objects should have the getInverse() method on there. The idea of the inverse method was to get away from this need to "reverse" & "complement" a Sequence in order to get Sequence on the reverse strand. This should simplify your interface to the code shrinking your "I need the opposite strand of this DNA" to:

SequenceView<NucleotideCompound> inv = dna.getInverse();

Inverse of a subsequence:

SequenceView<NucleotideCompound> inv = dna.getSubSequence(10, 30).getInverse();

The CompoundSets tell the Sequence objects what it needs to do to the default impl in SequenceMixin.inverse() so it can be shared amongst Sequence impls which do not share a common hierarchy.

Anyway if you can let me know where in your code base the relevant classes are I can do my best to help out. After all I see the Ensembl Java API an important use-case for the BioJava API :)

All the best mate,

Andy

On 5 Nov 2010, at 16:04, PATERSON Trevor wrote:

> Hi Andy et al.
> 
> I have just being looking at the changes to Sequence that you were discussing and checked in last week...
> 
> Just to let you know :)..
> 
> These were a bit awkward for me, as I have implemented my own Reader/BackingStore to lazy load sequences from Ensembl, and I hadn't implemented most of the methods that would be needed to use the new way of getting 
> Sequence strings
> 
> So as a temporary fix my subclass overrides the AbstractSequence method getSequenceAsString(), to do it the old way through the Reader, as does a method I am using getSequenceAsString(Integer, Integer)
> 
> As we are trying to get a publishable version of our Ensembl API together (which will use your first BioJava release version)  - I don't want to spend much time alterring things to do it the new way at this stage. If I get time (& money) I will have a look at implementing a fully functional reader using your new approach.
> 
> 
> On a related tack, and something you have helped me out with before..
> 
> If we want to get 'reverse' and 'complement' s of a subsequence, it still seems to be the case that you need to make an intermediate Sequence object from the SubSequence View as these methods aren't available on the View interface... Is that correct?
> 
> As I have mentioned we are trying to write the Ensembl API up, and as a demo of potential usage we have made a little plug-in for the Savant genome browser that uses the Ensembl Java API  to pull chromosomes and annotations out of Ensembl... We have a SourceForge Project for all this now ( http://jensembl.sourceforge.net/ ) - so it will be excellent when we can tie our code to your first release version.
> 
> Cheers 
> Trevor
> 
> Trevor Paterson PhD
> email trevor.paterson at roslin.ed.ac.uk
> 
> Bioinformatics 
> The Roslin Institute
> The Royal (Dick) School of Veterinary Studies
> University of Edinburgh
> Scotland EH25 9PS
> phone +44 (0)131 5274197
> http://bioinformatics.roslin.ed.ac.uk/
> 
> Please consider the environment before printing this e-mail
> 
> The University of Edinburgh is a charitable body, registered in Scotland with registration number SC005336
> Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. 
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andy Yates
> Sent: 03 November 2010 10:56
> To: Andreas Prlic
> Cc: biojava-dev
> Subject: Re: [Biojava-dev] Changes to Sequence in BioJava3
> 
> It is which is why I want people to check their code still works. I can only run tests from my end :)
> 
> Andy
> 
> On 2 Nov 2010, at 21:54, Andreas Prlic wrote:
> 
>> thanks! looks like a major patch...
>> A
>> 
>> On Tue, Nov 2, 2010 at 4:52 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>>> As I said earlier these changes were going in. They are now checked in. Can people check their code still works against this. I've had to do some changes to core (obviously), genomic & alignment. Test cases all pass but I'd be happier once everyone okays this.
>>> 
>>> If so then I can push out a release
>>> 
>>> Andy
>>> 
>>> On 2 Nov 2010, at 15:16, Andy Yates wrote:
>>> 
>>>> Hi everyone,
>>>> 
>>>> As a caution to people with implementations already built on the Sequence interface I'm proposing a couple of changes to it. This will cause a binary class incompatibility & will have impacts in the methods you need to implement but I'll sort them out at the BioJava core end.
>>>> 
>>>> 1). Removal of getSequenceAsString(Integer,Integer,Strand)
>>>> ** The implementation is patchy & buggy often exposing data from 
>>>> backing stores
>>>> 
>>>> 2). Addition of SequenceView<C> getReverse()
>>>> ** Will return the sequence in the reverse strand
>>>> ** Also complemented if applicable
>>>> 
>>>> 3). Addition of isComplementable() to CompoundSet
>>>> ** Used to support the above function
>>>> 
>>>> This means substrings of Sequences are retrieved as so:
>>>> 
>>>> DNASequence d = new DNASequence("ATGCGC"); d.getSubSequence(2, 
>>>> 5).getSequenceAsString(); //Returns TGCG d.getSubSequence(2, 
>>>> 5).getReverse().getSequenceAsString(); //Returns CGCT
>>>> 
>>>> To support -ve strand indexes you can use the Location objects (the returned Location is expressed in +ve coordinates):
>>>> 
>>>> Location l = Location.Tools.location(5, 2, Strand.NEGATIVE, 
>>>> d.getLength()); SequenceView<NucleotideCompound> locationSeq = 
>>>> l.getSubSequence(d); locationSeq.getSequenceAsString(); //Returns 
>>>> CGCT
>>>> 
>>>> Hopefully the implications of these changes will be small & will 
>>>> benefit the code
>>>> 
>>>> Andy
>>>> 
>>>> p.s. If you are wondering why I am not proposing a deprecation is 
>>>> because I do not want developers writing quite complex code 
>>>> depending on this functionality. If this was not an alpha release 
>>>> then deprecation would be the only way to go
>>>> 
>>>> 
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>> 
>>> --
>>> Andrew Yates                   Ensembl Genomes Engineer
>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>> 
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/








More information about the biojava-dev mailing list