[Biojava-dev] Why BJ3 should be multithreaded
Andreas Prlic
ap3 at sanger.ac.uk
Wed Apr 9 10:40:58 UTC 2008
Hi,
I like the idea of having support for multiple threads. Only thing
is, when running BioJava on our compute farm, I am pretty sure our
admins won't be happy if BJ would use more than just a single CPU,
unless run on special hardware. As such there should be a BJ wide
configuration management, which would allow to determine how many
CPUs to be used (and the default could be all of them).
Andreas
On 9 Apr 2008, at 09:28, Andy Yates wrote:
> Lo,
>
> This is the kind of problem Java7 is attempting to solve with the
> fork-join framework (which really is a rip-off of Google's
> MapReduce). There's two ways of looking at thread safety & how to
> implement it:
>
> * Packages which could be threaded or want to be threaded are
> programmed with threading in mind using items from the
> util.concurrent package to split, queue & work with data points.
>
> * Packages can be created as required & have data to process passed
> to them for processing in a stateless manner; much in the same way
> servlet engines and a lot of web frameworks run
>
> The first way does mean we can support environments with useful
> multi-threaded support (no point in threading on a single CPU/core
> box) from the word go. The second way would require some plumbing
> on the user's behalf but this would be very easy plumbing; the
> majority of which we could write (like wrapping things in instances
> of Callables).
>
> Anyway my 2p worth :)
>
> Andy
>
> Mark Schreiber wrote:
>> Hi -
>> I was just playing with threads to see how efficient they are on
>> one of our old 4 CPU IBM servers. The following fairly naive
>> program splits a large array of numbers and sums them all up. The
>> multi-threaded version is 2.5 times faster even allowing for
>> thread overhead. The program could be even better if I make more
>> use of the java1.5 concurrent package.
>> Similar tasks in biojava would be include training distributions
>> which should see similar performance improvements. Much of the
>> current biojava doesn't make use of threads and worse, requires
>> the developer to manage all the thread safety themselves.
>> - Mark
>> /*
>> * To change this template, choose Tools | Templates
>> * and open the template in the editor.
>> */
>> package concurrent;
>> import java.util.concurrent.atomic.AtomicInteger;
>> /**
>> * This program demo's the use of threads to sum a large array of
>> integers.
>> * @author Mark Schreiber
>> */
>> public class ThreadedAdder {
>> static int processors = Runtime.getRuntime
>> ().availableProcessors();
>> int bigNumber = 10000000;
>> int[] bigArray = new int[bigNumber * processors];
>> public ThreadedAdder(){
>> //make a big array of integers (10 000 000 numbers for
>> each processor)
>> for(int i = 0; i < bigArray.length; i++){
>> //random number between 1 and 100
>> bigArray[i] = (int)(Math.random() * 100.0);
>> }
>> }
>> public void singleThreadedAdd(){
>> int result = 0;
>> //single threaded sum
>> long start = System.currentTimeMillis();
>> for(int number : bigArray){
>> result += number;
>> }
>> long time = System.currentTimeMillis() - start;
>> System.out.println("Calculation time = "+time+" ms");
>> System.out.println("total = "+result);
>> }
>> public void multiThreadedAdd() throws InterruptedException{
>> AtomicInteger total = new AtomicInteger();
>> long start = System.currentTimeMillis();
>> AddingThread[] threads = new AddingThread[processors];
>> for(int i = 0; i < threads.length; i++){
>> threads[i] = new AddingThread("Thread "+i, i *
>> bigNumber, total);
>> System.out.println(threads[i].getName()+" starting");
>> threads[i].start();
>> }
>> for(Thread thread : threads){
>> //make sure everyone is finished
>> thread.join();
>> }
>> long time = System.currentTimeMillis() - start;
>> System.out.println("Calculation time = "+time+" ms");
>> System.out.println("total = "+total);
>> }
>> /**
>> * @param args the command line arguments
>> */
>> public static void main(String[] args) throws Exception{
>> //how many processors do I have?
>> System.out.println("Available processors = "+processors);
>> System.out.println("Initializing number array");
>> ThreadedAdder adder = new ThreadedAdder();
>> System.out.println("single thread add");
>> adder.singleThreadedAdd();
>> System.out.println("multi thread add");
>> adder.multiThreadedAdd();
>> }
>> public class AddingThread extends Thread{
>> int internalTotal = 0;
>> int offSet = 0;
>> AtomicInteger callBackTotal;
>> public AddingThread(String name, int offSet,
>> AtomicInteger callBackTotal){
>> super(name);
>> this.offSet = offSet;
>> this.callBackTotal = callBackTotal;
>> }
>> @Override
>> public void run(){
>> for(int i = offSet; i < offSet + bigNumber; i++){
>> internalTotal += bigArray[i];
>> }
>> callBackTotal.addAndGet(internalTotal);
>> System.out.println(this.getName()+" complete");
>> }
>> }
>> }
-----------------------------------------------------------------------
Andreas Prlic Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA, UK
+44 (0) 1223 49 6891
-----------------------------------------------------------------------
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the biojava-dev
mailing list