[Biojava-dev] Why BJ3 should be multithreaded

Andy Yates ayates at ebi.ac.uk
Wed Apr 9 11:03:19 UTC 2008



Most the time any kind of farm management software (like LSF & please 
correct me if I'm wrong) looks at the amount of CPU time a process takes 
up and the number of threads it detects; not only the number of 
processes you have in a queue. So a multi-threaded biojava should not 
pose a problem to these systems. Not to mention with the newer multiple 
core computers; threaded software is becoming the only way to take full 
advantage of the available power.

Where you would want to ignore multi-threading is if you are in a queue 
like LSF and your x number of Java processes all get chucked onto the 
same machine. Then if you've got so many processor hungry operations all 
trying to create threads ... well it's not going to behave as optimally 
as you might hope.

Personally though I'd still air on the side of caution WRT 
multi-threading and not to have it as part of the default tools but as 
an Object I can instantiate to do my multi-threading work (so it's a 
choice at the user's level rather than the framework level). Then using 
the Java5 executor framework we let users submit work to pools of 
threads to do their work. Couple this with forcing us to pass around 
immutable messages between threads/callables (since values shared by 
threads are probably the number one cause of **** ups) you'll have one 
heck of a kick-ass scalable framework ;-)

Andy

Andreas Prlic wrote:
> Hi,
> 
> I like the idea of having support for multiple threads. Only thing is, 
> when running BioJava on our compute farm, I am pretty sure our admins 
> won't be happy if BJ would use more than just a single CPU, unless run 
> on special hardware. As such there should be a BJ wide configuration 
> management, which would allow to determine how many CPUs to be used (and 
> the default could be all of them).
> 
> Andreas
> 
> 
> On 9 Apr 2008, at 09:28, Andy Yates wrote:
> 
>> Lo,
>>
>> This is the kind of problem Java7 is attempting to solve with the 
>> fork-join framework (which really is a rip-off of Google's MapReduce). 
>> There's two ways of looking at thread safety & how to implement it:
>>
>> * Packages which could be threaded or want to be threaded are 
>> programmed with threading in mind using items from the util.concurrent 
>> package to split, queue & work with data points.
>>
>> * Packages can be created as required & have data to process passed to 
>> them for processing in a stateless manner; much in the same way 
>> servlet engines and a lot of web frameworks run
>>
>> The first way does mean we can support environments with useful 
>> multi-threaded support (no point in threading on a single CPU/core 
>> box) from the word go. The second way would require some plumbing on 
>> the user's behalf but this would be very easy plumbing; the majority 
>> of which we could write (like wrapping things in instances of Callables).
>>
>> Anyway my 2p worth :)
>>
>> Andy
>>
>> Mark Schreiber wrote:
>>> Hi -
>>> I was just playing with threads to see how efficient they are on one 
>>> of our old 4 CPU IBM servers.  The following fairly naive program 
>>> splits a large array of numbers and sums them all up.  The 
>>> multi-threaded version is 2.5 times faster even allowing for thread 
>>> overhead. The program could be even better if I make more use of the 
>>> java1.5 concurrent package.
>>> Similar tasks in biojava would be include training distributions 
>>> which should see similar performance improvements. Much of the 
>>> current biojava doesn't make use of threads and worse, requires the 
>>> developer to manage all the thread safety themselves.
>>> - Mark
>>> /*
>>>  * To change this template, choose Tools | Templates
>>>  * and open the template in the editor.
>>>  */
>>> package concurrent;
>>> import java.util.concurrent.atomic.AtomicInteger;
>>> /**
>>>  * This program demo's the use of threads to sum a large array of 
>>> integers.
>>>  * @author Mark Schreiber
>>>  */
>>> public class ThreadedAdder {
>>>     static int processors = Runtime.getRuntime().availableProcessors();
>>>     int bigNumber = 10000000;
>>>     int[] bigArray = new int[bigNumber * processors];
>>>         public ThreadedAdder(){
>>>         //make a big array of integers (10 000 000 numbers for each 
>>> processor)
>>>         for(int i = 0; i < bigArray.length; i++){
>>>             //random number between 1 and 100
>>>             bigArray[i] = (int)(Math.random() * 100.0);
>>>         }
>>>     }
>>>     public void singleThreadedAdd(){
>>>         int result = 0;
>>>               //single threaded sum
>>>         long start = System.currentTimeMillis();
>>>         for(int number : bigArray){
>>>             result += number;
>>>         }
>>>         long time = System.currentTimeMillis() - start;
>>>         System.out.println("Calculation time = "+time+" ms");
>>>         System.out.println("total = "+result);
>>>             }
>>>         public void multiThreadedAdd() throws InterruptedException{
>>>         AtomicInteger total = new AtomicInteger();
>>>         long start = System.currentTimeMillis();
>>>         AddingThread[] threads = new AddingThread[processors];
>>>         for(int i = 0; i < threads.length; i++){
>>>             threads[i] = new AddingThread("Thread "+i, i * bigNumber, 
>>> total);
>>>             System.out.println(threads[i].getName()+" starting");
>>>             threads[i].start();
>>>         }
>>>         for(Thread thread : threads){
>>>             //make sure everyone is finished
>>>             thread.join();
>>>         }
>>>         long time = System.currentTimeMillis() - start;
>>>         System.out.println("Calculation time = "+time+" ms");
>>>         System.out.println("total = "+total);
>>>     }
>>>         /**
>>>      * @param args the command line arguments
>>>      */
>>>     public static void main(String[] args) throws Exception{
>>>         //how many processors do I have?
>>>         System.out.println("Available processors = "+processors);
>>>         System.out.println("Initializing number array");
>>>         ThreadedAdder adder = new ThreadedAdder();
>>>                 System.out.println("single thread add");
>>>         adder.singleThreadedAdd();
>>>         System.out.println("multi thread add");
>>>         adder.multiThreadedAdd();
>>>     }
>>>     public class AddingThread extends Thread{
>>>         int internalTotal = 0;
>>>         int offSet = 0;
>>>         AtomicInteger callBackTotal;
>>>                 public AddingThread(String name, int offSet, 
>>> AtomicInteger callBackTotal){
>>>             super(name);
>>>             this.offSet = offSet;
>>>             this.callBackTotal = callBackTotal;
>>>         }
>>>                 @Override
>>>         public void run(){
>>>             for(int i = offSet; i < offSet + bigNumber; i++){
>>>                 internalTotal += bigArray[i];
>>>             }
>>>             callBackTotal.addAndGet(internalTotal);
>>>             System.out.println(this.getName()+" complete");
>>>         }
>>>     }
>>> }
> 
> -----------------------------------------------------------------------
> 
> Andreas Prlic      Wellcome Trust Sanger Institute
>                               Hinxton, Cambridge CB10 1SA, UK
>                               +44 (0) 1223 49 6891
> 
> -----------------------------------------------------------------------
> 
> 
> 
> 



More information about the biojava-dev mailing list