[Biojava-l] Evolutionary distances
Andy Yates
ayates at ebi.ac.uk
Wed Oct 24 08:09:13 UTC 2007
Our code is very similar but not identical. The original programmer
shortcutted a lot of else if conditions by considering if the two bases
were equal or not. It can then calculate the transitional changes &
assume the rest are transversional.
In terms of speed of both pieces of code I can't see an obvious way to
speed it up. Probably in our code removing the 10 or so calls to
String.charAt() with a two calls & referencing those chars might help
but in all honesty I cannot say.
Andy
Richard Holland wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Thanks.
>
> Your code is similar to the code we have in
> org.biojavax.bio.phylo.MultipleHitCorrection. I haven't checked it to
> see if it is identical, but it probably is.
>
> You can call our code like this:
>
> // import statement for biojava phylo stuff
> import org.biojavax.bio.phylo.*;
>
> // ...rest of code goes here
>
> // call Kimura2P
> String seq1 = ...; // Get seq1 and seq2 from somewhere
> String seq2 = ...;
> double result = MultipleHitCorrection.Kimura2P(seq1, seq2);
>
> Note that our implementation expects sequence strings to be in upper
> case, so you'll need to make sure your data is upper case or has been
> converted to upper case before calling our method.
>
> cheers,
> Richard
>
> vineith kaul wrote:
>> This is what I have .....Thanks a lot fr the help.
>>
>>
>> //Method to calculate the Kimura 2 parameter distance
>> public static double K2P(String sequence1,String sequence2){
>> long p=0,q=0,numberOfAlignedSites=0; // P= transitional
>> differences (A<->G & T<->C) ; Q= transversional differences (A/G<-->C/T)
>>
>>
>> char[] seq1array=sequence1.toCharArray();
>> char[] seq2array=sequence2.toCharArray();
>>
>> for(int i=0;i<seq1array.length;i++){
>> // Number of aligned sites
>> if(((seq1array[i]=='a') ||
>> (seq1array[i]=='A')||(seq1array[i]=='g') ||
>> (seq1array[i]=='G')||(seq1array[i]=='c') || (seq1array[i]=='C') ||
>> (seq1array[i]=='t') || (seq1array[i]=='T')) && ((seq2array[i]=='a') ||
>> (seq2array[i]=='A')||(seq2array[i]=='c') ||
>> (seq2array[i]=='C')||(seq2array[i]=='t') ||
>> (seq2array[i]=='T')||(seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>
>> numberOfAlignedSites++;
>> }
>>
>> if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>> p++;
>> }
>> else
>> if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>> p++;
>> }
>> else
>> if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>> p++;
>> }
>> else
>> if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>> p++;
>> }
>> else
>> if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>> q++;
>> }
>> else
>> if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>> q++;
>> }
>> else
>> if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>> q++;
>> }
>> else
>> if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>> q++;
>> }
>> else
>> if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>> q++;
>> }
>> else
>> if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>> q++;
>> }
>> else
>> if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>> q++;
>> }
>> else
>> if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>> q++;
>> }
>>
>>
>>
>>
>> }
>>
>> double P = 1.0 - (2.0 * ((double)p)/numberOfAlignedSites) -
>> (((double)q)/numberOfAlignedSites);
>> double Q = 1.0 - (2.0 * ((double)q)/numberOfAlignedSites);
>> System.out.print(numberOfAlignedSites+"\t"+p+"\t"+q+"\t");
>> double dist = (-0.5 * Math.log(P)) - ( 0.25 * Math.log(Q));
>> return dist;
>> }
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 10/22/07, *Richard Holland* <holland at ebi.ac.uk
>> <mailto:holland at ebi.ac.uk>> wrote:
>>
>> You should take a look at the latest 1.5 release, in the
>> org.biojavax.bio.phylo packages. This code is the beginnings of some
>> phylogenetics code that will perform tasks as you describe. The future
>> plan is to extend this code to cover a wider range of use cases.
>> Kimura2P
>> is already implemented here, in
>> org.biojavax.bio.phylo.MultipleHitCorrection.
>>
>> If you can't find code that will do what you want, but have written some
>> before, then please do feel free to contribute it. Even if it is
>> slow, I'm
>> sure someone out there will be able to help optimise it!
>>
>> cheers,
>> Richard
>>
>> On Sun, October 21, 2007 5:30 pm, vineith kaul wrote:
>> > Hi,
>> >
>> > Are there functions to calculate evolutionary pairwise distances like
>> > Kimura2P,Finkelstein etc in Biojava
>> > I did write smthng on my own but on large sequences it runs terribly
>> > slow and I am not even sure if thats right.
>> > --
>> > Vineith Kaul
>> > Masters Student Bioinformatics
>> > The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
>> > Georgia Tech, Atlanta
>> > _______________________________________________
>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> <mailto:Biojava-l at lists.open-bio.org>
>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >
>>
>>
>> --
>> Richard Holland
>> BioMart ( http://www.biomart.org/)
>> EMBL-EBI
>> Hinxton, Cambridgeshire CB10 1SD, UK
>>
>>
>>
>>
>> --
>> Vineith Kaul
>> Masters Student Bioinformatics
>> The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
>> Georgia Tech, Atlanta
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFHHvm34C5LeMEKA/QRAlc3AJ9GAMML/z5+BBl12PA2a/Zyz/CHDQCdFWKa
> 4iKvsyBj2uznhhjTF9EYDFE=
> =LALE
> -----END PGP SIGNATURE-----
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list