[Biojava-l] Evolutionary distances

Richard Holland holland at ebi.ac.uk
Wed Oct 24 07:52:24 UTC 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks.

Your code is similar to the code we have in
org.biojavax.bio.phylo.MultipleHitCorrection. I haven't checked it to
see if it is identical, but it probably is.

You can call our code like this:

 // import statement for biojava phylo stuff
 import org.biojavax.bio.phylo.*;

 // ...rest of code goes here

 // call Kimura2P
 String seq1 = ...; // Get seq1 and seq2 from somewhere
 String seq2 = ...;
 double result = MultipleHitCorrection.Kimura2P(seq1, seq2);

Note that our implementation expects sequence strings to be in upper
case, so you'll need to make sure your data is upper case or has been
converted to upper case before calling our method.

cheers,
Richard

vineith kaul wrote:
> This is what I have .....Thanks a lot  fr the help.
> 
> 
> //Method to calculate the Kimura 2 parameter distance
> public static double K2P(String sequence1,String sequence2){
>         long p=0,q=0,numberOfAlignedSites=0; // P= transitional
> differences (A<->G & T<->C) ; Q= transversional differences (A/G<-->C/T)
> 
> 
>         char[] seq1array=sequence1.toCharArray();
>         char[] seq2array=sequence2.toCharArray();
> 
>         for(int i=0;i<seq1array.length;i++){
>                                 // Number of aligned sites
>                 if(((seq1array[i]=='a') ||
> (seq1array[i]=='A')||(seq1array[i]=='g') ||
> (seq1array[i]=='G')||(seq1array[i]=='c') || (seq1array[i]=='C') ||
> (seq1array[i]=='t') || (seq1array[i]=='T')) && ((seq2array[i]=='a') ||
> (seq2array[i]=='A')||(seq2array[i]=='c') ||
> (seq2array[i]=='C')||(seq2array[i]=='t') ||
> (seq2array[i]=='T')||(seq2array[i]=='g') || (seq2array[i]=='G'))) {
> 
>                         numberOfAlignedSites++;
>                 }
> 
>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>                         p++;
>                 }
>                 else
>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>                         p++;
>                 }
>                 else
>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>                         p++;
>                 }
>                 else
>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>                         p++;
>                 }
>                 else
>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>                                 q++;
>                         }
>                 else
>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>                                 q++;
>                         }
>                 else
>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>                                         q++;
>                                 }
>                 else
>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>                                         q++;
>                                 }
>                 else
>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>                                         q++;
>                                 }
>                 else
>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>                                         q++;
>                                 }
>                 else
>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>                                         q++;
>                                 }
>                 else
>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>                                         q++;
>                                 }
> 
> 
> 
> 
>         }
> 
>          double P = 1.0 - (2.0 * ((double)p)/numberOfAlignedSites) -
> (((double)q)/numberOfAlignedSites);
>          double Q = 1.0 - (2.0 * ((double)q)/numberOfAlignedSites);
>          System.out.print(numberOfAlignedSites+"\t"+p+"\t"+q+"\t");
>          double dist = (-0.5 * Math.log(P)) - ( 0.25 * Math.log(Q));
>          return dist;
> }
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 10/22/07, *Richard Holland* <holland at ebi.ac.uk
> <mailto:holland at ebi.ac.uk>> wrote:
> 
>     You should take a look at the latest 1.5 release, in the
>     org.biojavax.bio.phylo packages. This code is the beginnings of some
>     phylogenetics code that will perform tasks as you describe. The future
>     plan is to extend this code to cover a wider range of use cases.
>     Kimura2P
>     is already implemented here, in
>     org.biojavax.bio.phylo.MultipleHitCorrection.
> 
>     If you can't find code that will do what you want, but have written some
>     before, then please do feel free to contribute it. Even if it is
>     slow, I'm
>     sure someone out there will be able to help optimise it!
> 
>     cheers,
>     Richard
> 
>     On Sun, October 21, 2007 5:30 pm, vineith kaul wrote:
>     > Hi,
>     >
>     > Are there functions to calculate evolutionary pairwise distances like
>     > Kimura2P,Finkelstein etc in Biojava
>     > I did write smthng on my own but on large sequences it runs terribly
>     > slow and I am not even sure if thats right.
>     > --
>     > Vineith Kaul
>     > Masters Student Bioinformatics
>     > The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
>     > Georgia Tech, Atlanta
>     > _______________________________________________
>     > Biojava-l mailing list  -   Biojava-l at lists.open-bio.org
>     <mailto:Biojava-l at lists.open-bio.org>
>     > http://lists.open-bio.org/mailman/listinfo/biojava-l
>     >
> 
> 
>     --
>     Richard Holland
>     BioMart ( http://www.biomart.org/)
>     EMBL-EBI
>     Hinxton, Cambridgeshire CB10 1SD, UK
> 
> 
> 
> 
> -- 
> Vineith Kaul
> Masters Student Bioinformatics
> The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
> Georgia Tech, Atlanta
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHHvm34C5LeMEKA/QRAlc3AJ9GAMML/z5+BBl12PA2a/Zyz/CHDQCdFWKa
4iKvsyBj2uznhhjTF9EYDFE=
=LALE
-----END PGP SIGNATURE-----



More information about the Biojava-l mailing list