[Biojava-l] Evolutionary distances

Mark Schreiber markjschreiber at gmail.com
Wed Oct 24 11:59:04 UTC 2007


Hi -

>From experience the best way to optimize java code is to run a
profiler. The one in Netbeans is quite good.

The reason is that the hotspot or JIT compilers might natively compile
the part of the code that you think is slow and actually make it
faster than something else which becomes the bottle neck. Using a good
profiler you can detect how much time is spent in each method and pin
point some candidate methods for optimization. You can also see if
there is a burden due to creation of lots of objects.

- Mark

On 10/24/07, Andy Yates <ayates at ebi.ac.uk> wrote:
> Our code is very similar but not identical. The original programmer
> shortcutted a lot of else if conditions by considering if the two bases
> were equal or not. It can then calculate the transitional changes &
> assume the rest are transversional.
>
> In terms of speed of both pieces of code I can't see an obvious way to
> speed it up. Probably in our code removing the 10 or so calls to
> String.charAt() with a two calls & referencing those chars might help
> but in all honesty I cannot say.
>
> Andy
>
> Richard Holland wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Thanks.
> >
> > Your code is similar to the code we have in
> > org.biojavax.bio.phylo.MultipleHitCorrection. I haven't checked it to
> > see if it is identical, but it probably is.
> >
> > You can call our code like this:
> >
> >  // import statement for biojava phylo stuff
> >  import org.biojavax.bio.phylo.*;
> >
> >  // ...rest of code goes here
> >
> >  // call Kimura2P
> >  String seq1 = ...; // Get seq1 and seq2 from somewhere
> >  String seq2 = ...;
> >  double result = MultipleHitCorrection.Kimura2P(seq1, seq2);
> >
> > Note that our implementation expects sequence strings to be in upper
> > case, so you'll need to make sure your data is upper case or has been
> > converted to upper case before calling our method.
> >
> > cheers,
> > Richard
> >
> > vineith kaul wrote:
> >> This is what I have .....Thanks a lot  fr the help.
> >>
> >>
> >> //Method to calculate the Kimura 2 parameter distance
> >> public static double K2P(String sequence1,String sequence2){
> >>         long p=0,q=0,numberOfAlignedSites=0; // P= transitional
> >> differences (A<->G & T<->C) ; Q= transversional differences (A/G<-->C/T)
> >>
> >>
> >>         char[] seq1array=sequence1.toCharArray();
> >>         char[] seq2array=sequence2.toCharArray();
> >>
> >>         for(int i=0;i<seq1array.length;i++){
> >>                                 // Number of aligned sites
> >>                 if(((seq1array[i]=='a') ||
> >> (seq1array[i]=='A')||(seq1array[i]=='g') ||
> >> (seq1array[i]=='G')||(seq1array[i]=='c') || (seq1array[i]=='C') ||
> >> (seq1array[i]=='t') || (seq1array[i]=='T')) && ((seq2array[i]=='a') ||
> >> (seq2array[i]=='A')||(seq2array[i]=='c') ||
> >> (seq2array[i]=='C')||(seq2array[i]=='t') ||
> >> (seq2array[i]=='T')||(seq2array[i]=='g') || (seq2array[i]=='G'))) {
> >>
> >>                         numberOfAlignedSites++;
> >>                 }
> >>
> >>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> >> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
> >>                         p++;
> >>                 }
> >>                 else
> >>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> >> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
> >>                         p++;
> >>                 }
> >>                 else
> >>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> >> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
> >>                         p++;
> >>                 }
> >>                 else
> >>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> >> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
> >>                         p++;
> >>                 }
> >>                 else
> >>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> >> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
> >>                                 q++;
> >>                         }
> >>                 else
> >>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> >> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
> >>                                 q++;
> >>                         }
> >>                 else
> >>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> >> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
> >>                                         q++;
> >>                                 }
> >>                 else
> >>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> >> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
> >>                                         q++;
> >>                                 }
> >>                 else
> >>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> >> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
> >>                                         q++;
> >>                                 }
> >>                 else
> >>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> >> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
> >>                                         q++;
> >>                                 }
> >>                 else
> >>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> >> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
> >>                                         q++;
> >>                                 }
> >>                 else
> >>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> >> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
> >>                                         q++;
> >>                                 }
> >>
> >>
> >>
> >>
> >>         }
> >>
> >>          double P = 1.0 - (2.0 * ((double)p)/numberOfAlignedSites) -
> >> (((double)q)/numberOfAlignedSites);
> >>          double Q = 1.0 - (2.0 * ((double)q)/numberOfAlignedSites);
> >>          System.out.print(numberOfAlignedSites+"\t"+p+"\t"+q+"\t");
> >>          double dist = (-0.5 * Math.log(P)) - ( 0.25 * Math.log(Q));
> >>          return dist;
> >> }
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 10/22/07, *Richard Holland* <holland at ebi.ac.uk
> >> <mailto:holland at ebi.ac.uk>> wrote:
> >>
> >>     You should take a look at the latest 1.5 release, in the
> >>     org.biojavax.bio.phylo packages. This code is the beginnings of some
> >>     phylogenetics code that will perform tasks as you describe. The future
> >>     plan is to extend this code to cover a wider range of use cases.
> >>     Kimura2P
> >>     is already implemented here, in
> >>     org.biojavax.bio.phylo.MultipleHitCorrection.
> >>
> >>     If you can't find code that will do what you want, but have written some
> >>     before, then please do feel free to contribute it. Even if it is
> >>     slow, I'm
> >>     sure someone out there will be able to help optimise it!
> >>
> >>     cheers,
> >>     Richard
> >>
> >>     On Sun, October 21, 2007 5:30 pm, vineith kaul wrote:
> >>     > Hi,
> >>     >
> >>     > Are there functions to calculate evolutionary pairwise distances like
> >>     > Kimura2P,Finkelstein etc in Biojava
> >>     > I did write smthng on my own but on large sequences it runs terribly
> >>     > slow and I am not even sure if thats right.
> >>     > --
> >>     > Vineith Kaul
> >>     > Masters Student Bioinformatics
> >>     > The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
> >>     > Georgia Tech, Atlanta
> >>     > _______________________________________________
> >>     > Biojava-l mailing list  -   Biojava-l at lists.open-bio.org
> >>     <mailto:Biojava-l at lists.open-bio.org>
> >>     > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>     >
> >>
> >>
> >>     --
> >>     Richard Holland
> >>     BioMart ( http://www.biomart.org/)
> >>     EMBL-EBI
> >>     Hinxton, Cambridgeshire CB10 1SD, UK
> >>
> >>
> >>
> >>
> >> --
> >> Vineith Kaul
> >> Masters Student Bioinformatics
> >> The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
> >> Georgia Tech, Atlanta
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.2.2 (GNU/Linux)
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >
> > iD8DBQFHHvm34C5LeMEKA/QRAlc3AJ9GAMML/z5+BBl12PA2a/Zyz/CHDQCdFWKa
> > 4iKvsyBj2uznhhjTF9EYDFE=
> > =LALE
> > -----END PGP SIGNATURE-----
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



More information about the Biojava-l mailing list