[Biojava-l] Evolutionary distances

Mark Schreiber markjschreiber at gmail.com
Wed Oct 24 13:48:00 UTC 2007


It appears it is as simple as:

Runtime.getRuntime().availableProcessors();

- Mark

On 10/24/07, Mark Schreiber <markjschreiber at gmail.com> wrote:
> I'm not aware of a way to determine the number of CPU's within a
> program although possibly it is one the the environment variables
> available from System.
>
> Even if it can't be determined there could be a method argument to
> specify the number of threads to spawn.
>
> - Mark
>
> On 10/24/07, Richard Holland <holland at ebi.ac.uk> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > This particular code could easily be parallelised - given N threads, you
> > can simply divide the input into N chunks and get each thread to process
> > 1/Nth of the input. You then combine the output of each thread to do the
> > final calculation.
> >
> > But, it'd be bad practice to always fork a predetermined N threads for a
> > given task. It'd be much better to somehow be able to ask 'how parallel
> > can I make this?' at runtime by checking system resources, or maybe get
> > the parallel-savvy user to set an optional BioJava-wide parallelisation
> > hint. N could then be determined and the task divided appropriately.
> >
> > cheers,
> > Richard
> >
> > Mark Schreiber wrote:
> > > Another important consideration after optimization is can the task be
> > > multithreaded?  Almost all modern computers have at least 2 cores. So
> > > if the algorithm can be parallelized you will get some performance
> > > bonus on most machines.
> > >
> > > Modern JVM's will automagically try to use idle CPU's to execute new
> > > threads spawned by the programmer.
> > >
> > > - Mark
> > >
> > > On 10/24/07, Andy Yates <ayates at ebi.ac.uk> wrote:
> > >> Yes a very good point & one I was going to make before hand but forgot :)
> > >>
> > >> Also not to mention that micro-benchmarks/profiling in Java are
> > >> notorious for giving false results due to VM warmup & JIT compilation
> > >> optimisations. There is a framework hosted on Java.net somewhere which
> > >> can perform VM warmups and code iterations to produce more accurate
> > >> benchmarking results; but the name escapes me at the moment.
> > >>
> > >> However looking at this particular code I get the feeling that this is
> > >> about as fast as its going to get without someone doing bitwise XOR
> > >> operations or some C code ... that's not an open invitation for people
> > >> to start recoding this in C :). At the end of the day the key to
> > >> optimisation is to ask the question "is it fast enough already?". If it
> > >> is then there's no point :)
> > >>
> > >> Andy
> > >>
> > >> Mark Schreiber wrote:
> > >>> Hi -
> > >>>
> > >>> >From experience the best way to optimize java code is to run a
> > >>> profiler. The one in Netbeans is quite good.
> > >>>
> > >>> The reason is that the hotspot or JIT compilers might natively compile
> > >>> the part of the code that you think is slow and actually make it
> > >>> faster than something else which becomes the bottle neck. Using a good
> > >>> profiler you can detect how much time is spent in each method and pin
> > >>> point some candidate methods for optimization. You can also see if
> > >>> there is a burden due to creation of lots of objects.
> > >>>
> > >>> - Mark
> > >>>
> > >>> On 10/24/07, Andy Yates <ayates at ebi.ac.uk> wrote:
> > >>>> Our code is very similar but not identical. The original programmer
> > >>>> shortcutted a lot of else if conditions by considering if the two bases
> > >>>> were equal or not. It can then calculate the transitional changes &
> > >>>> assume the rest are transversional.
> > >>>>
> > >>>> In terms of speed of both pieces of code I can't see an obvious way to
> > >>>> speed it up. Probably in our code removing the 10 or so calls to
> > >>>> String.charAt() with a two calls & referencing those chars might help
> > >>>> but in all honesty I cannot say.
> > >>>>
> > >>>> Andy
> > >>>>
> > >>>> Richard Holland wrote:
> > > Thanks.
> > >
> > > Your code is similar to the code we have in
> > > org.biojavax.bio.phylo.MultipleHitCorrection. I haven't checked it to
> > > see if it is identical, but it probably is.
> > >
> > > You can call our code like this:
> > >
> > >  // import statement for biojava phylo stuff
> > >  import org.biojavax.bio.phylo.*;
> > >
> > >  // ...rest of code goes here
> > >
> > >  // call Kimura2P
> > >  String seq1 = ...; // Get seq1 and seq2 from somewhere
> > >  String seq2 = ...;
> > >  double result = MultipleHitCorrection.Kimura2P(seq1, seq2);
> > >
> > > Note that our implementation expects sequence strings to be in upper
> > > case, so you'll need to make sure your data is upper case or has been
> > > converted to upper case before calling our method.
> > >
> > > cheers,
> > > Richard
> > >
> > > vineith kaul wrote:
> > >>>>>>> This is what I have .....Thanks a lot  fr the help.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> //Method to calculate the Kimura 2 parameter distance
> > >>>>>>> public static double K2P(String sequence1,String sequence2){
> > >>>>>>>         long p=0,q=0,numberOfAlignedSites=0; // P= transitional
> > >>>>>>> differences (A<->G & T<->C) ; Q= transversional differences (A/G<-->C/T)
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>         char[] seq1array=sequence1.toCharArray();
> > >>>>>>>         char[] seq2array=sequence2.toCharArray();
> > >>>>>>>
> > >>>>>>>         for(int i=0;i<seq1array.length;i++){
> > >>>>>>>                                 // Number of aligned sites
> > >>>>>>>                 if(((seq1array[i]=='a') ||
> > >>>>>>> (seq1array[i]=='A')||(seq1array[i]=='g') ||
> > >>>>>>> (seq1array[i]=='G')||(seq1array[i]=='c') || (seq1array[i]=='C') ||
> > >>>>>>> (seq1array[i]=='t') || (seq1array[i]=='T')) && ((seq2array[i]=='a') ||
> > >>>>>>> (seq2array[i]=='A')||(seq2array[i]=='c') ||
> > >>>>>>> (seq2array[i]=='C')||(seq2array[i]=='t') ||
> > >>>>>>> (seq2array[i]=='T')||(seq2array[i]=='g') || (seq2array[i]=='G'))) {
> > >>>>>>>
> > >>>>>>>                         numberOfAlignedSites++;
> > >>>>>>>                 }
> > >>>>>>>
> > >>>>>>>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> > >>>>>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
> > >>>>>>>                         p++;
> > >>>>>>>                 }
> > >>>>>>>                 else
> > >>>>>>>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> > >>>>>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
> > >>>>>>>                         p++;
> > >>>>>>>                 }
> > >>>>>>>                 else
> > >>>>>>>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> > >>>>>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
> > >>>>>>>                         p++;
> > >>>>>>>                 }
> > >>>>>>>                 else
> > >>>>>>>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> > >>>>>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
> > >>>>>>>                         p++;
> > >>>>>>>                 }
> > >>>>>>>                 else
> > >>>>>>>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> > >>>>>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
> > >>>>>>>                                 q++;
> > >>>>>>>                         }
> > >>>>>>>                 else
> > >>>>>>>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> > >>>>>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
> > >>>>>>>                                 q++;
> > >>>>>>>                         }
> > >>>>>>>                 else
> > >>>>>>>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> > >>>>>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
> > >>>>>>>                                         q++;
> > >>>>>>>                                 }
> > >>>>>>>                 else
> > >>>>>>>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> > >>>>>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
> > >>>>>>>                                         q++;
> > >>>>>>>                                 }
> > >>>>>>>                 else
> > >>>>>>>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> > >>>>>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
> > >>>>>>>                                         q++;
> > >>>>>>>                                 }
> > >>>>>>>                 else
> > >>>>>>>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> > >>>>>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
> > >>>>>>>                                         q++;
> > >>>>>>>                                 }
> > >>>>>>>                 else
> > >>>>>>>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> > >>>>>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
> > >>>>>>>                                         q++;
> > >>>>>>>                                 }
> > >>>>>>>                 else
> > >>>>>>>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> > >>>>>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
> > >>>>>>>                                         q++;
> > >>>>>>>                                 }
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>         }
> > >>>>>>>
> > >>>>>>>          double P = 1.0 - (2.0 * ((double)p)/numberOfAlignedSites) -
> > >>>>>>> (((double)q)/numberOfAlignedSites);
> > >>>>>>>          double Q = 1.0 - (2.0 * ((double)q)/numberOfAlignedSites);
> > >>>>>>>          System.out.print(numberOfAlignedSites+"\t"+p+"\t"+q+"\t");
> > >>>>>>>          double dist = (-0.5 * Math.log(P)) - ( 0.25 * Math.log(Q));
> > >>>>>>>          return dist;
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 10/22/07, *Richard Holland* <holland at ebi.ac.uk
> > >>>>>>> <mailto:holland at ebi.ac.uk>> wrote:
> > >>>>>>>
> > >>>>>>>     You should take a look at the latest 1.5 release, in the
> > >>>>>>>     org.biojavax.bio.phylo packages. This code is the beginnings of some
> > >>>>>>>     phylogenetics code that will perform tasks as you describe. The future
> > >>>>>>>     plan is to extend this code to cover a wider range of use cases.
> > >>>>>>>     Kimura2P
> > >>>>>>>     is already implemented here, in
> > >>>>>>>     org.biojavax.bio.phylo.MultipleHitCorrection.
> > >>>>>>>
> > >>>>>>>     If you can't find code that will do what you want, but have written some
> > >>>>>>>     before, then please do feel free to contribute it. Even if it is
> > >>>>>>>     slow, I'm
> > >>>>>>>     sure someone out there will be able to help optimise it!
> > >>>>>>>
> > >>>>>>>     cheers,
> > >>>>>>>     Richard
> > >>>>>>>
> > >>>>>>>     On Sun, October 21, 2007 5:30 pm, vineith kaul wrote:
> > >>>>>>>     > Hi,
> > >>>>>>>     >
> > >>>>>>>     > Are there functions to calculate evolutionary pairwise distances like
> > >>>>>>>     > Kimura2P,Finkelstein etc in Biojava
> > >>>>>>>     > I did write smthng on my own but on large sequences it runs terribly
> > >>>>>>>     > slow and I am not even sure if thats right.
> > >>>>>>>     > --
> > >>>>>>>     > Vineith Kaul
> > >>>>>>>     > Masters Student Bioinformatics
> > >>>>>>>     > The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
> > >>>>>>>     > Georgia Tech, Atlanta
> > >>>>>>>     > _______________________________________________
> > >>>>>>>     > Biojava-l mailing list  -   Biojava-l at lists.open-bio.org
> > >>>>>>>     <mailto:Biojava-l at lists.open-bio.org>
> > >>>>>>>     > http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >>>>>>>     >
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>     --
> > >>>>>>>     Richard Holland
> > >>>>>>>     BioMart ( http://www.biomart.org/)
> > >>>>>>>     EMBL-EBI
> > >>>>>>>     Hinxton, Cambridgeshire CB10 1SD, UK
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Vineith Kaul
> > >>>>>>> Masters Student Bioinformatics
> > >>>>>>> The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
> > >>>>>>> Georgia Tech, Atlanta
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >>>> _______________________________________________
> > >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >>>>
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.2.2 (GNU/Linux)
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >
> > iD8DBQFHH0nB4C5LeMEKA/QRAjW9AJwPcaHByaQhAQRPLJOZt4gQRF0TbgCeIa6P
> > IEyRleSs1+AziCvfhcES8wI=
> > =uLDm
> > -----END PGP SIGNATURE-----
> >
>



More information about the Biojava-l mailing list