[Biojava-l] Evolutionary distances

Mark Schreiber markjschreiber at gmail.com
Wed Oct 24 13:19:25 UTC 2007


Another important consideration after optimization is can the task be
multithreaded?  Almost all modern computers have at least 2 cores. So
if the algorithm can be parallelized you will get some performance
bonus on most machines.

Modern JVM's will automagically try to use idle CPU's to execute new
threads spawned by the programmer.

- Mark

On 10/24/07, Andy Yates <ayates at ebi.ac.uk> wrote:
> Yes a very good point & one I was going to make before hand but forgot :)
>
> Also not to mention that micro-benchmarks/profiling in Java are
> notorious for giving false results due to VM warmup & JIT compilation
> optimisations. There is a framework hosted on Java.net somewhere which
> can perform VM warmups and code iterations to produce more accurate
> benchmarking results; but the name escapes me at the moment.
>
> However looking at this particular code I get the feeling that this is
> about as fast as its going to get without someone doing bitwise XOR
> operations or some C code ... that's not an open invitation for people
> to start recoding this in C :). At the end of the day the key to
> optimisation is to ask the question "is it fast enough already?". If it
> is then there's no point :)
>
> Andy
>
> Mark Schreiber wrote:
> > Hi -
> >
> >>From experience the best way to optimize java code is to run a
> > profiler. The one in Netbeans is quite good.
> >
> > The reason is that the hotspot or JIT compilers might natively compile
> > the part of the code that you think is slow and actually make it
> > faster than something else which becomes the bottle neck. Using a good
> > profiler you can detect how much time is spent in each method and pin
> > point some candidate methods for optimization. You can also see if
> > there is a burden due to creation of lots of objects.
> >
> > - Mark
> >
> > On 10/24/07, Andy Yates <ayates at ebi.ac.uk> wrote:
> >> Our code is very similar but not identical. The original programmer
> >> shortcutted a lot of else if conditions by considering if the two bases
> >> were equal or not. It can then calculate the transitional changes &
> >> assume the rest are transversional.
> >>
> >> In terms of speed of both pieces of code I can't see an obvious way to
> >> speed it up. Probably in our code removing the 10 or so calls to
> >> String.charAt() with a two calls & referencing those chars might help
> >> but in all honesty I cannot say.
> >>
> >> Andy
> >>
> >> Richard Holland wrote:
> >>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> Hash: SHA1
> >>>
> >>> Thanks.
> >>>
> >>> Your code is similar to the code we have in
> >>> org.biojavax.bio.phylo.MultipleHitCorrection. I haven't checked it to
> >>> see if it is identical, but it probably is.
> >>>
> >>> You can call our code like this:
> >>>
> >>>  // import statement for biojava phylo stuff
> >>>  import org.biojavax.bio.phylo.*;
> >>>
> >>>  // ...rest of code goes here
> >>>
> >>>  // call Kimura2P
> >>>  String seq1 = ...; // Get seq1 and seq2 from somewhere
> >>>  String seq2 = ...;
> >>>  double result = MultipleHitCorrection.Kimura2P(seq1, seq2);
> >>>
> >>> Note that our implementation expects sequence strings to be in upper
> >>> case, so you'll need to make sure your data is upper case or has been
> >>> converted to upper case before calling our method.
> >>>
> >>> cheers,
> >>> Richard
> >>>
> >>> vineith kaul wrote:
> >>>> This is what I have .....Thanks a lot  fr the help.
> >>>>
> >>>>
> >>>> //Method to calculate the Kimura 2 parameter distance
> >>>> public static double K2P(String sequence1,String sequence2){
> >>>>         long p=0,q=0,numberOfAlignedSites=0; // P= transitional
> >>>> differences (A<->G & T<->C) ; Q= transversional differences (A/G<-->C/T)
> >>>>
> >>>>
> >>>>         char[] seq1array=sequence1.toCharArray();
> >>>>         char[] seq2array=sequence2.toCharArray();
> >>>>
> >>>>         for(int i=0;i<seq1array.length;i++){
> >>>>                                 // Number of aligned sites
> >>>>                 if(((seq1array[i]=='a') ||
> >>>> (seq1array[i]=='A')||(seq1array[i]=='g') ||
> >>>> (seq1array[i]=='G')||(seq1array[i]=='c') || (seq1array[i]=='C') ||
> >>>> (seq1array[i]=='t') || (seq1array[i]=='T')) && ((seq2array[i]=='a') ||
> >>>> (seq2array[i]=='A')||(seq2array[i]=='c') ||
> >>>> (seq2array[i]=='C')||(seq2array[i]=='t') ||
> >>>> (seq2array[i]=='T')||(seq2array[i]=='g') || (seq2array[i]=='G'))) {
> >>>>
> >>>>                         numberOfAlignedSites++;
> >>>>                 }
> >>>>
> >>>>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> >>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
> >>>>                         p++;
> >>>>                 }
> >>>>                 else
> >>>>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> >>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
> >>>>                         p++;
> >>>>                 }
> >>>>                 else
> >>>>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> >>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
> >>>>                         p++;
> >>>>                 }
> >>>>                 else
> >>>>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> >>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
> >>>>                         p++;
> >>>>                 }
> >>>>                 else
> >>>>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> >>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
> >>>>                                 q++;
> >>>>                         }
> >>>>                 else
> >>>>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> >>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
> >>>>                                 q++;
> >>>>                         }
> >>>>                 else
> >>>>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> >>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
> >>>>                                         q++;
> >>>>                                 }
> >>>>                 else
> >>>>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> >>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
> >>>>                                         q++;
> >>>>                                 }
> >>>>                 else
> >>>>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> >>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
> >>>>                                         q++;
> >>>>                                 }
> >>>>                 else
> >>>>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> >>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
> >>>>                                         q++;
> >>>>                                 }
> >>>>                 else
> >>>>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> >>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
> >>>>                                         q++;
> >>>>                                 }
> >>>>                 else
> >>>>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> >>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
> >>>>                                         q++;
> >>>>                                 }
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>         }
> >>>>
> >>>>          double P = 1.0 - (2.0 * ((double)p)/numberOfAlignedSites) -
> >>>> (((double)q)/numberOfAlignedSites);
> >>>>          double Q = 1.0 - (2.0 * ((double)q)/numberOfAlignedSites);
> >>>>          System.out.print(numberOfAlignedSites+"\t"+p+"\t"+q+"\t");
> >>>>          double dist = (-0.5 * Math.log(P)) - ( 0.25 * Math.log(Q));
> >>>>          return dist;
> >>>> }
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 10/22/07, *Richard Holland* <holland at ebi.ac.uk
> >>>> <mailto:holland at ebi.ac.uk>> wrote:
> >>>>
> >>>>     You should take a look at the latest 1.5 release, in the
> >>>>     org.biojavax.bio.phylo packages. This code is the beginnings of some
> >>>>     phylogenetics code that will perform tasks as you describe. The future
> >>>>     plan is to extend this code to cover a wider range of use cases.
> >>>>     Kimura2P
> >>>>     is already implemented here, in
> >>>>     org.biojavax.bio.phylo.MultipleHitCorrection.
> >>>>
> >>>>     If you can't find code that will do what you want, but have written some
> >>>>     before, then please do feel free to contribute it. Even if it is
> >>>>     slow, I'm
> >>>>     sure someone out there will be able to help optimise it!
> >>>>
> >>>>     cheers,
> >>>>     Richard
> >>>>
> >>>>     On Sun, October 21, 2007 5:30 pm, vineith kaul wrote:
> >>>>     > Hi,
> >>>>     >
> >>>>     > Are there functions to calculate evolutionary pairwise distances like
> >>>>     > Kimura2P,Finkelstein etc in Biojava
> >>>>     > I did write smthng on my own but on large sequences it runs terribly
> >>>>     > slow and I am not even sure if thats right.
> >>>>     > --
> >>>>     > Vineith Kaul
> >>>>     > Masters Student Bioinformatics
> >>>>     > The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
> >>>>     > Georgia Tech, Atlanta
> >>>>     > _______________________________________________
> >>>>     > Biojava-l mailing list  -   Biojava-l at lists.open-bio.org
> >>>>     <mailto:Biojava-l at lists.open-bio.org>
> >>>>     > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>     >
> >>>>
> >>>>
> >>>>     --
> >>>>     Richard Holland
> >>>>     BioMart ( http://www.biomart.org/)
> >>>>     EMBL-EBI
> >>>>     Hinxton, Cambridgeshire CB10 1SD, UK
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Vineith Kaul
> >>>> Masters Student Bioinformatics
> >>>> The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
> >>>> Georgia Tech, Atlanta
> >>> -----BEGIN PGP SIGNATURE-----
> >>> Version: GnuPG v1.4.2.2 (GNU/Linux)
> >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >>>
> >>> iD8DBQFHHvm34C5LeMEKA/QRAlc3AJ9GAMML/z5+BBl12PA2a/Zyz/CHDQCdFWKa
> >>> 4iKvsyBj2uznhhjTF9EYDFE=
> >>> =LALE
> >>> -----END PGP SIGNATURE-----
> >>> _______________________________________________
> >>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >> _______________________________________________
> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>
>



More information about the Biojava-l mailing list