[Biojava-l] Evolutionary distances

Andy Yates ayates at ebi.ac.uk
Wed Oct 24 08:09:13 UTC 2007


Our code is very similar but not identical. The original programmer 
shortcutted a lot of else if conditions by considering if the two bases 
were equal or not. It can then calculate the transitional changes & 
assume the rest are transversional.

In terms of speed of both pieces of code I can't see an obvious way to 
speed it up. Probably in our code removing the 10 or so calls to 
String.charAt() with a two calls & referencing those chars might help 
but in all honesty I cannot say.

Andy

Richard Holland wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Thanks.
> 
> Your code is similar to the code we have in
> org.biojavax.bio.phylo.MultipleHitCorrection. I haven't checked it to
> see if it is identical, but it probably is.
> 
> You can call our code like this:
> 
>  // import statement for biojava phylo stuff
>  import org.biojavax.bio.phylo.*;
> 
>  // ...rest of code goes here
> 
>  // call Kimura2P
>  String seq1 = ...; // Get seq1 and seq2 from somewhere
>  String seq2 = ...;
>  double result = MultipleHitCorrection.Kimura2P(seq1, seq2);
> 
> Note that our implementation expects sequence strings to be in upper
> case, so you'll need to make sure your data is upper case or has been
> converted to upper case before calling our method.
> 
> cheers,
> Richard
> 
> vineith kaul wrote:
>> This is what I have .....Thanks a lot  fr the help.
>>
>>
>> //Method to calculate the Kimura 2 parameter distance
>> public static double K2P(String sequence1,String sequence2){
>>         long p=0,q=0,numberOfAlignedSites=0; // P= transitional
>> differences (A<->G & T<->C) ; Q= transversional differences (A/G<-->C/T)
>>
>>
>>         char[] seq1array=sequence1.toCharArray();
>>         char[] seq2array=sequence2.toCharArray();
>>
>>         for(int i=0;i<seq1array.length;i++){
>>                                 // Number of aligned sites
>>                 if(((seq1array[i]=='a') ||
>> (seq1array[i]=='A')||(seq1array[i]=='g') ||
>> (seq1array[i]=='G')||(seq1array[i]=='c') || (seq1array[i]=='C') ||
>> (seq1array[i]=='t') || (seq1array[i]=='T')) && ((seq2array[i]=='a') ||
>> (seq2array[i]=='A')||(seq2array[i]=='c') ||
>> (seq2array[i]=='C')||(seq2array[i]=='t') ||
>> (seq2array[i]=='T')||(seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>
>>                         numberOfAlignedSites++;
>>                 }
>>
>>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>                         p++;
>>                 }
>>                 else
>>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>>                         p++;
>>                 }
>>                 else
>>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>>                         p++;
>>                 }
>>                 else
>>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>>                         p++;
>>                 }
>>                 else
>>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>>                                 q++;
>>                         }
>>                 else
>>                 if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>>                                 q++;
>>                         }
>>                 else
>>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>>                                         q++;
>>                                 }
>>                 else
>>                 if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>>                                         q++;
>>                                 }
>>                 else
>>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>>                                         q++;
>>                                 }
>>                 else
>>                 if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>                                         q++;
>>                                 }
>>                 else
>>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>>                                         q++;
>>                                 }
>>                 else
>>                 if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>                                         q++;
>>                                 }
>>
>>
>>
>>
>>         }
>>
>>          double P = 1.0 - (2.0 * ((double)p)/numberOfAlignedSites) -
>> (((double)q)/numberOfAlignedSites);
>>          double Q = 1.0 - (2.0 * ((double)q)/numberOfAlignedSites);
>>          System.out.print(numberOfAlignedSites+"\t"+p+"\t"+q+"\t");
>>          double dist = (-0.5 * Math.log(P)) - ( 0.25 * Math.log(Q));
>>          return dist;
>> }
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 10/22/07, *Richard Holland* <holland at ebi.ac.uk
>> <mailto:holland at ebi.ac.uk>> wrote:
>>
>>     You should take a look at the latest 1.5 release, in the
>>     org.biojavax.bio.phylo packages. This code is the beginnings of some
>>     phylogenetics code that will perform tasks as you describe. The future
>>     plan is to extend this code to cover a wider range of use cases.
>>     Kimura2P
>>     is already implemented here, in
>>     org.biojavax.bio.phylo.MultipleHitCorrection.
>>
>>     If you can't find code that will do what you want, but have written some
>>     before, then please do feel free to contribute it. Even if it is
>>     slow, I'm
>>     sure someone out there will be able to help optimise it!
>>
>>     cheers,
>>     Richard
>>
>>     On Sun, October 21, 2007 5:30 pm, vineith kaul wrote:
>>     > Hi,
>>     >
>>     > Are there functions to calculate evolutionary pairwise distances like
>>     > Kimura2P,Finkelstein etc in Biojava
>>     > I did write smthng on my own but on large sequences it runs terribly
>>     > slow and I am not even sure if thats right.
>>     > --
>>     > Vineith Kaul
>>     > Masters Student Bioinformatics
>>     > The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
>>     > Georgia Tech, Atlanta
>>     > _______________________________________________
>>     > Biojava-l mailing list  -   Biojava-l at lists.open-bio.org
>>     <mailto:Biojava-l at lists.open-bio.org>
>>     > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>     >
>>
>>
>>     --
>>     Richard Holland
>>     BioMart ( http://www.biomart.org/)
>>     EMBL-EBI
>>     Hinxton, Cambridgeshire CB10 1SD, UK
>>
>>
>>
>>
>> -- 
>> Vineith Kaul
>> Masters Student Bioinformatics
>> The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
>> Georgia Tech, Atlanta
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFHHvm34C5LeMEKA/QRAlc3AJ9GAMML/z5+BBl12PA2a/Zyz/CHDQCdFWKa
> 4iKvsyBj2uznhhjTF9EYDFE=
> =LALE
> -----END PGP SIGNATURE-----
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l



More information about the Biojava-l mailing list