[Biojava-l] BioJava translation

Andy Yates ayates at ebi.ac.uk
Wed Oct 13 16:25:41 UTC 2010


That's great news and should be even faster once we get rid of the requirement to upper case since you're having to parse the same sequence twice.

I wonder what the C version does to make itself even faster

Andy

On 13 Oct 2010, at 17:13, Pjotr Prins wrote:

> On Wed, Oct 13, 2010 at 12:27:05PM +0100, Richard Holland wrote:
>> BJ3 should be replacing most sequence operations with string operations, making the whole thing much faster.
> 
> Good news, BJ3 is a lot faster! The previous version took 2 minutes
> for the C.elegans genome (33 Mb), the BJ3 version takes 27sec on my
> modest Thinkpad X61 laptop. After parsing the Fasta and turning it
> into an upper case string the actual translation takes 16sec.
> 
> Only the C implementations are faster.
> 
> Here the relevant Scala code:
> 
> import bio._
> import java.io._
> import org.biojava3.core.sequence._
> import org.biojava3.core.sequence.transcription.TranscriptionEngine
> import org.biojava3.core.sequence.io.IUPACParser
> 
> // <cut> fetching infile from command line...
> 
> IUPACParser.getInstance().getTable(1);  // not sure we need this
> IUPACParser.getInstance().getTable("UNIVERSAL");
> val engine = TranscriptionEngine.getDefault()
> val f = new FastaReader(infile)
> f.foreach { 
>  res => 
>    val (id,tag,dna) = res
>    println(List(">",id).mkString) 
>    val dna2 = new DNASequence(dna.mkString.toUpperCase)
>    val rna = dna2.getRNASequence(engine)
>    println(rna.getProteinSequence(engine))
>  }
> }
> 
> prints:
> 
>> B0222.10
> MLYWNDLNTVGIVADTIWKYYADQYKRLIKEHSKIRFNPLLHAASVIRIFDIHINLFNNFVTFLLVIFLYIFLIYYVTFFVFPFGPLRVSHWMRFFKIIISRSHLG
>> B0222.11
> MSRRTASKLLVFVFLCSLCFGTQRYDMPRKIDLFNDLITQSTTPASPKCQCLPPTTPSTPPNCIPYDSRLQAASLEEAIVAFPDLTITRQEKTQQSTATLNNCKTKQCRDCYKDLRSQLRKVGLLPGTIDQVFHNQRNFTTCQKYRFARQDKGVYEKKKKAKQHYDWDYVEYDEDEDDDYFWDGLFWKKKRNVLKKIVKRDVEATTAISQPPNSTAMNSTGIIGIRFPISCTTRGVTPDGLGTVSLCSTCWVWRRLPSTYYPAYLNEVVCDYADTSCLSGYASCQTGTQQLNVLRNDSGKLIPISVSAGINCECRLAVGSTLESLVLGQGISKAMPPIDTTSTKPPNLATSTTSHS
> (...)
> 
> Pj.
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/








More information about the Biojava-l mailing list