[Biojava-l] Evolutionary distances

Mark Schreiber markjschreiber at gmail.com
Wed Oct 24 13:41:16 UTC 2007


I'm not aware of a way to determine the number of CPU's within a
program although possibly it is one the the environment variables
available from System.

Even if it can't be determined there could be a method argument to
specify the number of threads to spawn.

- Mark

On 10/24/07, Richard Holland <holland at ebi.ac.uk> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> This particular code could easily be parallelised - given N threads, you
> can simply divide the input into N chunks and get each thread to process
> 1/Nth of the input. You then combine the output of each thread to do the
> final calculation.
>
> But, it'd be bad practice to always fork a predetermined N threads for a
> given task. It'd be much better to somehow be able to ask 'how parallel
> can I make this?' at runtime by checking system resources, or maybe get
> the parallel-savvy user to set an optional BioJava-wide parallelisation
> hint. N could then be determined and the task divided appropriately.
>
> cheers,
> Richard
>
> Mark Schreiber wrote:
> > Another important consideration after optimization is can the task be
> > multithreaded?  Almost all modern computers have at least 2 cores. So
> > if the algorithm can be parallelized you will get some performance
> > bonus on most machines.
> >
> > Modern JVM's will automagically try to use idle CPU's to execute new
> > threads spawned by the programmer.
> >
> > - Mark
> >
> > On 10/24/07, Andy Yates <ayates at ebi.ac.uk> wrote:
> >> Yes a very good point & one I was going to make before hand but forgot :)
> >>
> >> Also not to mention that micro-benchmarks/profiling in Java are
> >> notorious for giving false results due to VM warmup & JIT compilation
> >> optimisations. There is a framework hosted on Java.net somewhere which
> >> can perform VM warmups and code iterations to produce more accurate
> >> benchmarking results; but the name escapes me at the moment.
> >>
> >> However looking at this particular code I get the feeling that this is
> >> about as fast as its going to get without someone doing bitwise XOR
> >> operations or some C code ... that's not an open invitation for people
> >> to start recoding this in C :). At the end of the day the key to
> >> optimisation is to ask the question "is it fast enough already?". If it
> >> is then there's no point :)
> >>
> >> Andy
> >>
> >> Mark Schreiber wrote:
> >>> Hi -
> >>>
> >>> >From experience the best way to optimize java code is to run a
> >>> profiler. The one in Netbeans is quite good.
> >>>
> >>> The reason is that the hotspot or JIT compilers might natively compile
> >>> the part of the code that you think is slow and actually make it
> >>> faster than something else which becomes the bottle neck. Using a good
> >>> profiler you can detect how much time is spent in each method and pin
> >>> point some candidate methods for optimization. You can also see if
> >>> there is a burden due to creation of lots of objects.
> >>>
> >>> - Mark
> >>>
> >>> On 10/24/07, Andy Yates <ayates at ebi.ac.uk> wrote:
> >>>> Our code is very similar but not identical. The original programmer
> >>>> shortcutted a lot of else if conditions by considering if the two bases
> >>>> were equal or not. It can then calculate the transitional changes &
> >>>> assume the rest are transversional.
> >>>>
> >>>> In terms of speed of both pieces of code I can't see an obvious way to
> >>>> speed it up. Probably in our code removing the 10 or so calls to
> >>>> String.charAt() with a two calls & referencing those chars might help
> >>>> but in all honesty I cannot say.
> >>>>
> >>>> Andy
> >>>>
> >>>> Richard Holland wrote:
> > Thanks.
> >
> > Your code is similar to the code we have in
> > org.biojavax.bio.phylo.MultipleHitCorrection. I haven't checked it to
> > see if it is identical, but it probably is.
> >
> > You can call our code like this:
> >
> >  // import statement for biojava phylo stuff
> >  import org.biojavax.bio.phylo.*;
> >
> >  // ...rest of code goes here
> >
> >  // call Kimura2P
> >  String seq1 = ...; // Get seq1 and seq2 from somewhere
> >  String seq2 = ...;
> >  double result = MultipleHitCorrection.Kimura2P(seq1, seq2);
> >
> > Note that our implementation expects sequence strings to be in upper
> > case, so you'll need to make sure your data is upper case or has been
> > converted to upper case before calling our method.
> >
> > cheers,
> > Richard
> >
> > vineith kaul wrote:
> >>>>>>> This is what I have .....Thanks a lot  fr the help.
> >>>>>>>
> >>>>>>>
> >>>>>>> //Method to calculate the Kimura 2 parameter distance
> >>>>>>> public static double K2P(String sequence1,String sequence2){
> >>>>>>>         long p=0,q=0,numberOfAlignedSites=0; // P= transitional
> >>>>>>> differences (A<->G & T<->C) ; Q= transversional differences (A/G<-->C/T)
> >>>>>>>
> >>>>>>>
> >>>>>>>         char[] seq1array=sequence1.toCharArray();
> >>>>>>>         char[] seq2array=sequence2.toCharArray();
> >>>>>>>
> >>>>>>>         for(int i=0;i<seq1array.length;i++){
> >>>>>>>                                 // Number of aligned sites
> >>>>>>>                 if(((seq1array[i]=='a') ||
> >>>>>>> (seq1array[i]=='A')||(seq1array[i]=='g') ||
> >>>>>>> (seq1array[i]=='G')||(seq1array[i]=='c') || (seq1array[i]=='C') ||
> >>>>>>> (seq1array[i]=='t') || (seq1array[i]=='T')) && ((seq2array[i]=='a') ||
> >>>>>>> (seq2array[i]=='A')||(seq2array[i]=='c') ||
> >>>>>>> (seq2array[i]=='C')||(seq2array[i]=='t') ||
> >>>>>>> (seq2array[i]=='T')||(seq2array[i]=='g') || (seq2array[i]=='G'))) {
> >>>>>>>
> >>>>>>>                         numberOfAlignedSites++;
> >>>>>>>                 }
> >>>>>>>
> >>>>>>>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> >>>>>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
> >>>>>>>                         p++;
> >>>>>>>                 }
> >>>>>>>                 else
> >>>>>>>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> >>>>>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
> >>>>>>>                         p++;
> >>>>>>>                 }
> >>>>>>>                 else
> >>>>>>>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> >>>>>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
> >>>>>>>                         p++;
> >>>>>>>                 }
> >>>>>>>                 else
> >>>>>>>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> >>>>>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
> >>>>>>>                         p++;
> >>>>>>>                 }
> >>>>>>>                 else
> >>>>>>>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> >>>>>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
> >>>>>>>                                 q++;
> >>>>>>>                         }
> >>>>>>>                 else
> >>>>>>>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> >>>>>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
> >>>>>>>                                 q++;
> >>>>>>>                         }
> >>>>>>>                 else
> >>>>>>>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> >>>>>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
> >>>>>>>                                         q++;
> >>>>>>>                                 }
> >>>>>>>                 else
> >>>>>>>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> >>>>>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
> >>>>>>>                                         q++;
> >>>>>>>                                 }
> >>>>>>>                 else
> >>>>>>>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> >>>>>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
> >>>>>>>                                         q++;
> >>>>>>>                                 }
> >>>>>>>                 else
> >>>>>>>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> >>>>>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
> >>>>>>>                                         q++;
> >>>>>>>                                 }
> >>>>>>>                 else
> >>>>>>>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> >>>>>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
> >>>>>>>                                         q++;
> >>>>>>>                                 }
> >>>>>>>                 else
> >>>>>>>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> >>>>>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
> >>>>>>>                                         q++;
> >>>>>>>                                 }
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         }
> >>>>>>>
> >>>>>>>          double P = 1.0 - (2.0 * ((double)p)/numberOfAlignedSites) -
> >>>>>>> (((double)q)/numberOfAlignedSites);
> >>>>>>>          double Q = 1.0 - (2.0 * ((double)q)/numberOfAlignedSites);
> >>>>>>>          System.out.print(numberOfAlignedSites+"\t"+p+"\t"+q+"\t");
> >>>>>>>          double dist = (-0.5 * Math.log(P)) - ( 0.25 * Math.log(Q));
> >>>>>>>          return dist;
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 10/22/07, *Richard Holland* <holland at ebi.ac.uk
> >>>>>>> <mailto:holland at ebi.ac.uk>> wrote:
> >>>>>>>
> >>>>>>>     You should take a look at the latest 1.5 release, in the
> >>>>>>>     org.biojavax.bio.phylo packages. This code is the beginnings of some
> >>>>>>>     phylogenetics code that will perform tasks as you describe. The future
> >>>>>>>     plan is to extend this code to cover a wider range of use cases.
> >>>>>>>     Kimura2P
> >>>>>>>     is already implemented here, in
> >>>>>>>     org.biojavax.bio.phylo.MultipleHitCorrection.
> >>>>>>>
> >>>>>>>     If you can't find code that will do what you want, but have written some
> >>>>>>>     before, then please do feel free to contribute it. Even if it is
> >>>>>>>     slow, I'm
> >>>>>>>     sure someone out there will be able to help optimise it!
> >>>>>>>
> >>>>>>>     cheers,
> >>>>>>>     Richard
> >>>>>>>
> >>>>>>>     On Sun, October 21, 2007 5:30 pm, vineith kaul wrote:
> >>>>>>>     > Hi,
> >>>>>>>     >
> >>>>>>>     > Are there functions to calculate evolutionary pairwise distances like
> >>>>>>>     > Kimura2P,Finkelstein etc in Biojava
> >>>>>>>     > I did write smthng on my own but on large sequences it runs terribly
> >>>>>>>     > slow and I am not even sure if thats right.
> >>>>>>>     > --
> >>>>>>>     > Vineith Kaul
> >>>>>>>     > Masters Student Bioinformatics
> >>>>>>>     > The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
> >>>>>>>     > Georgia Tech, Atlanta
> >>>>>>>     > _______________________________________________
> >>>>>>>     > Biojava-l mailing list  -   Biojava-l at lists.open-bio.org
> >>>>>>>     <mailto:Biojava-l at lists.open-bio.org>
> >>>>>>>     > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>>>>     >
> >>>>>>>
> >>>>>>>
> >>>>>>>     --
> >>>>>>>     Richard Holland
> >>>>>>>     BioMart ( http://www.biomart.org/)
> >>>>>>>     EMBL-EBI
> >>>>>>>     Hinxton, Cambridgeshire CB10 1SD, UK
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Vineith Kaul
> >>>>>>> Masters Student Bioinformatics
> >>>>>>> The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
> >>>>>>> Georgia Tech, Atlanta
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>> _______________________________________________
> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFHH0nB4C5LeMEKA/QRAjW9AJwPcaHByaQhAQRPLJOZt4gQRF0TbgCeIa6P
> IEyRleSs1+AziCvfhcES8wI=
> =uLDm
> -----END PGP SIGNATURE-----
>



More information about the Biojava-l mailing list