[Biojava-l] Evolutionary distances
Andy Yates
ayates at ebi.ac.uk
Wed Oct 24 13:49:38 UTC 2007
Beat me to it :)
Andy
Mark Schreiber wrote:
> It appears it is as simple as:
>
> Runtime.getRuntime().availableProcessors();
>
> - Mark
>
> On 10/24/07, Mark Schreiber <markjschreiber at gmail.com> wrote:
>> I'm not aware of a way to determine the number of CPU's within a
>> program although possibly it is one the the environment variables
>> available from System.
>>
>> Even if it can't be determined there could be a method argument to
>> specify the number of threads to spawn.
>>
>> - Mark
>>
>> On 10/24/07, Richard Holland <holland at ebi.ac.uk> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> This particular code could easily be parallelised - given N threads, you
>>> can simply divide the input into N chunks and get each thread to process
>>> 1/Nth of the input. You then combine the output of each thread to do the
>>> final calculation.
>>>
>>> But, it'd be bad practice to always fork a predetermined N threads for a
>>> given task. It'd be much better to somehow be able to ask 'how parallel
>>> can I make this?' at runtime by checking system resources, or maybe get
>>> the parallel-savvy user to set an optional BioJava-wide parallelisation
>>> hint. N could then be determined and the task divided appropriately.
>>>
>>> cheers,
>>> Richard
>>>
>>> Mark Schreiber wrote:
>>>> Another important consideration after optimization is can the task be
>>>> multithreaded? Almost all modern computers have at least 2 cores. So
>>>> if the algorithm can be parallelized you will get some performance
>>>> bonus on most machines.
>>>>
>>>> Modern JVM's will automagically try to use idle CPU's to execute new
>>>> threads spawned by the programmer.
>>>>
>>>> - Mark
>>>>
>>>> On 10/24/07, Andy Yates <ayates at ebi.ac.uk> wrote:
>>>>> Yes a very good point & one I was going to make before hand but forgot :)
>>>>>
>>>>> Also not to mention that micro-benchmarks/profiling in Java are
>>>>> notorious for giving false results due to VM warmup & JIT compilation
>>>>> optimisations. There is a framework hosted on Java.net somewhere which
>>>>> can perform VM warmups and code iterations to produce more accurate
>>>>> benchmarking results; but the name escapes me at the moment.
>>>>>
>>>>> However looking at this particular code I get the feeling that this is
>>>>> about as fast as its going to get without someone doing bitwise XOR
>>>>> operations or some C code ... that's not an open invitation for people
>>>>> to start recoding this in C :). At the end of the day the key to
>>>>> optimisation is to ask the question "is it fast enough already?". If it
>>>>> is then there's no point :)
>>>>>
>>>>> Andy
>>>>>
>>>>> Mark Schreiber wrote:
>>>>>> Hi -
>>>>>>
>>>>>> >From experience the best way to optimize java code is to run a
>>>>>> profiler. The one in Netbeans is quite good.
>>>>>>
>>>>>> The reason is that the hotspot or JIT compilers might natively compile
>>>>>> the part of the code that you think is slow and actually make it
>>>>>> faster than something else which becomes the bottle neck. Using a good
>>>>>> profiler you can detect how much time is spent in each method and pin
>>>>>> point some candidate methods for optimization. You can also see if
>>>>>> there is a burden due to creation of lots of objects.
>>>>>>
>>>>>> - Mark
>>>>>>
>>>>>> On 10/24/07, Andy Yates <ayates at ebi.ac.uk> wrote:
>>>>>>> Our code is very similar but not identical. The original programmer
>>>>>>> shortcutted a lot of else if conditions by considering if the two bases
>>>>>>> were equal or not. It can then calculate the transitional changes &
>>>>>>> assume the rest are transversional.
>>>>>>>
>>>>>>> In terms of speed of both pieces of code I can't see an obvious way to
>>>>>>> speed it up. Probably in our code removing the 10 or so calls to
>>>>>>> String.charAt() with a two calls & referencing those chars might help
>>>>>>> but in all honesty I cannot say.
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>> Richard Holland wrote:
>>>> Thanks.
>>>>
>>>> Your code is similar to the code we have in
>>>> org.biojavax.bio.phylo.MultipleHitCorrection. I haven't checked it to
>>>> see if it is identical, but it probably is.
>>>>
>>>> You can call our code like this:
>>>>
>>>> // import statement for biojava phylo stuff
>>>> import org.biojavax.bio.phylo.*;
>>>>
>>>> // ...rest of code goes here
>>>>
>>>> // call Kimura2P
>>>> String seq1 = ...; // Get seq1 and seq2 from somewhere
>>>> String seq2 = ...;
>>>> double result = MultipleHitCorrection.Kimura2P(seq1, seq2);
>>>>
>>>> Note that our implementation expects sequence strings to be in upper
>>>> case, so you'll need to make sure your data is upper case or has been
>>>> converted to upper case before calling our method.
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> vineith kaul wrote:
>>>>>>>>>> This is what I have .....Thanks a lot fr the help.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> //Method to calculate the Kimura 2 parameter distance
>>>>>>>>>> public static double K2P(String sequence1,String sequence2){
>>>>>>>>>> long p=0,q=0,numberOfAlignedSites=0; // P= transitional
>>>>>>>>>> differences (A<->G & T<->C) ; Q= transversional differences (A/G<-->C/T)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> char[] seq1array=sequence1.toCharArray();
>>>>>>>>>> char[] seq2array=sequence2.toCharArray();
>>>>>>>>>>
>>>>>>>>>> for(int i=0;i<seq1array.length;i++){
>>>>>>>>>> // Number of aligned sites
>>>>>>>>>> if(((seq1array[i]=='a') ||
>>>>>>>>>> (seq1array[i]=='A')||(seq1array[i]=='g') ||
>>>>>>>>>> (seq1array[i]=='G')||(seq1array[i]=='c') || (seq1array[i]=='C') ||
>>>>>>>>>> (seq1array[i]=='t') || (seq1array[i]=='T')) && ((seq2array[i]=='a') ||
>>>>>>>>>> (seq2array[i]=='A')||(seq2array[i]=='c') ||
>>>>>>>>>> (seq2array[i]=='C')||(seq2array[i]=='t') ||
>>>>>>>>>> (seq2array[i]=='T')||(seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>>>>>>>>>
>>>>>>>>>> numberOfAlignedSites++;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
>>>>>>>>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>>>>>>>>> p++;
>>>>>>>>>> }
>>>>>>>>>> else
>>>>>>>>>> if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
>>>>>>>>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>>>>>>>>>> p++;
>>>>>>>>>> }
>>>>>>>>>> else
>>>>>>>>>> if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
>>>>>>>>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>>>>>>>>>> p++;
>>>>>>>>>> }
>>>>>>>>>> else
>>>>>>>>>> if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
>>>>>>>>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>>>>>>>>>> p++;
>>>>>>>>>> }
>>>>>>>>>> else
>>>>>>>>>> if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
>>>>>>>>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>>>>>>>>>> q++;
>>>>>>>>>> }
>>>>>>>>>> else
>>>>>>>>>> if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
>>>>>>>>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>>>>>>>>>> q++;
>>>>>>>>>> }
>>>>>>>>>> else
>>>>>>>>>> if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
>>>>>>>>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>>>>>>>>>> q++;
>>>>>>>>>> }
>>>>>>>>>> else
>>>>>>>>>> if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
>>>>>>>>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>>>>>>>>>> q++;
>>>>>>>>>> }
>>>>>>>>>> else
>>>>>>>>>> if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
>>>>>>>>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>>>>>>>>>> q++;
>>>>>>>>>> }
>>>>>>>>>> else
>>>>>>>>>> if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
>>>>>>>>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>>>>>>>>> q++;
>>>>>>>>>> }
>>>>>>>>>> else
>>>>>>>>>> if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
>>>>>>>>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>>>>>>>>>> q++;
>>>>>>>>>> }
>>>>>>>>>> else
>>>>>>>>>> if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
>>>>>>>>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>>>>>>>>> q++;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> double P = 1.0 - (2.0 * ((double)p)/numberOfAlignedSites) -
>>>>>>>>>> (((double)q)/numberOfAlignedSites);
>>>>>>>>>> double Q = 1.0 - (2.0 * ((double)q)/numberOfAlignedSites);
>>>>>>>>>> System.out.print(numberOfAlignedSites+"\t"+p+"\t"+q+"\t");
>>>>>>>>>> double dist = (-0.5 * Math.log(P)) - ( 0.25 * Math.log(Q));
>>>>>>>>>> return dist;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10/22/07, *Richard Holland* <holland at ebi.ac.uk
>>>>>>>>>> <mailto:holland at ebi.ac.uk>> wrote:
>>>>>>>>>>
>>>>>>>>>> You should take a look at the latest 1.5 release, in the
>>>>>>>>>> org.biojavax.bio.phylo packages. This code is the beginnings of some
>>>>>>>>>> phylogenetics code that will perform tasks as you describe. The future
>>>>>>>>>> plan is to extend this code to cover a wider range of use cases.
>>>>>>>>>> Kimura2P
>>>>>>>>>> is already implemented here, in
>>>>>>>>>> org.biojavax.bio.phylo.MultipleHitCorrection.
>>>>>>>>>>
>>>>>>>>>> If you can't find code that will do what you want, but have written some
>>>>>>>>>> before, then please do feel free to contribute it. Even if it is
>>>>>>>>>> slow, I'm
>>>>>>>>>> sure someone out there will be able to help optimise it!
>>>>>>>>>>
>>>>>>>>>> cheers,
>>>>>>>>>> Richard
>>>>>>>>>>
>>>>>>>>>> On Sun, October 21, 2007 5:30 pm, vineith kaul wrote:
>>>>>>>>>> > Hi,
>>>>>>>>>> >
>>>>>>>>>> > Are there functions to calculate evolutionary pairwise distances like
>>>>>>>>>> > Kimura2P,Finkelstein etc in Biojava
>>>>>>>>>> > I did write smthng on my own but on large sequences it runs terribly
>>>>>>>>>> > slow and I am not even sure if thats right.
>>>>>>>>>> > --
>>>>>>>>>> > Vineith Kaul
>>>>>>>>>> > Masters Student Bioinformatics
>>>>>>>>>> > The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
>>>>>>>>>> > Georgia Tech, Atlanta
>>>>>>>>>> > _______________________________________________
>>>>>>>>>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>>>>>>>>> <mailto:Biojava-l at lists.open-bio.org>
>>>>>>>>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Richard Holland
>>>>>>>>>> BioMart ( http://www.biomart.org/)
>>>>>>>>>> EMBL-EBI
>>>>>>>>>> Hinxton, Cambridgeshire CB10 1SD, UK
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Vineith Kaul
>>>>>>>>>> Masters Student Bioinformatics
>>>>>>>>>> The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
>>>>>>>>>> Georgia Tech, Atlanta
>>> _______________________________________________
>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>>> _______________________________________________
>>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>>
>>> iD8DBQFHH0nB4C5LeMEKA/QRAjW9AJwPcaHByaQhAQRPLJOZt4gQRF0TbgCeIa6P
>>> IEyRleSs1+AziCvfhcES8wI=
>>> =uLDm
>>> -----END PGP SIGNATURE-----
>>>
More information about the Biojava-l
mailing list