[Biojava-l] BioJava translation

Andy Yates ayates at ebi.ac.uk
Wed Oct 13 22:52:17 UTC 2010


LOL well you could always parallelise it :)

I've gone & pushed a new version of the translator code to the SVN repo so it'll filter through to the public server soon. There's an added test case as well. The overall impact of this change seems to be about 25 translations of BRCA2 per second so it is significant; our current limit looks to be approx. 200 per second.

I hope you find this is faster without the need to edit & parse a Sequence String twice

Andy

On 13 Oct 2010, at 20:16, Scooter Willis wrote:

> Pjotr
> 
> What is an extra 8 seconds among friends if you know you are going to get the correct answer and you can change the rules if needed!!!
> 
> Are you parsing the C.elgans genome or DNA representation of each protein in the C.elgans genome? 
> 
> If you take out the println statement that will help speed things up a bunch. Java System.out is always slow.
> 
> I am checking on the problem with upper case. That shouldn't be an issue.
> 
> Thanks
> 
> Scooter
> 
> 
> On Wed, Oct 13, 2010 at 2:17 PM, Pjotr Prins <pjotr.public23 at thebird.nl> wrote:
> I think it is a good idea. From a purist point of view you may object
> (it is not biological), but most libraries do exactly that.
> 
> If direct translation gets it down to 8sec, we may well half that
> with further tweaking.
> 
> Pj.
> 
> On Wed, Oct 13, 2010 at 01:16:01PM -0400, Scooter Willis wrote:
> > The Biojava3 has an additional validation layer and object creation going
> > from DNA sequence to RNA sequence and then using the appropriate translation
> > rules to return a protein sequence. Could be easily twice as fast if you
> > went from DNA sequence to ProteinSequence which would put it at 8 seconds.
> > We are going to carry a performance penalty setting everything up as a
> > proper object versus doing a simple String to String translation.
> 
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/








More information about the Biojava-l mailing list