[Biojava-dev] Why BJ3 should be multithreaded

Mark Schreiber markjschreiber at gmail.com
Wed Apr 9 11:45:16 UTC 2008


Andy is right on this, a JVM can use at most the available CPUs on one
machine (and sometimes not even that).

Unless there is a very sophisticated farm management system that makes it
look like all 100 cores are on the same machine then there is no chance that
the JVM can take over more than one machine (unless you start another whole
JVM from within your program).

On Wed, Apr 9, 2008 at 7:03 PM, Andy Yates <ayates at ebi.ac.uk> wrote:

>
>
> Most the time any kind of farm management software (like LSF & please
> correct me if I'm wrong) looks at the amount of CPU time a process takes up
> and the number of threads it detects; not only the number of processes you
> have in a queue. So a multi-threaded biojava should not pose a problem to
> these systems. Not to mention with the newer multiple core computers;
> threaded software is becoming the only way to take full advantage of the
> available power.
>
> Where you would want to ignore multi-threading is if you are in a queue
> like LSF and your x number of Java processes all get chucked onto the same
> machine. Then if you've got so many processor hungry operations all trying
> to create threads ... well it's not going to behave as optimally as you
> might hope.
>
> Personally though I'd still air on the side of caution WRT multi-threading
> and not to have it as part of the default tools but as an Object I can
> instantiate to do my multi-threading work (so it's a choice at the user's
> level rather than the framework level). Then using the Java5 executor
> framework we let users submit work to pools of threads to do their work.
> Couple this with forcing us to pass around immutable messages between
> threads/callables (since values shared by threads are probably the number
> one cause of **** ups) you'll have one heck of a kick-ass scalable framework
> ;-)
>
> Andy
>
>
> Andreas Prlic wrote:
>
> > Hi,
> >
> > I like the idea of having support for multiple threads. Only thing is,
> > when running BioJava on our compute farm, I am pretty sure our admins won't
> > be happy if BJ would use more than just a single CPU, unless run on special
> > hardware. As such there should be a BJ wide configuration management, which
> > would allow to determine how many CPUs to be used (and the default could be
> > all of them).
> >
> > Andreas
> >
> >
> > On 9 Apr 2008, at 09:28, Andy Yates wrote:
> >
> > Lo,
> > >
> > > This is the kind of problem Java7 is attempting to solve with the
> > > fork-join framework (which really is a rip-off of Google's MapReduce).
> > > There's two ways of looking at thread safety & how to implement it:
> > >
> > > * Packages which could be threaded or want to be threaded are
> > > programmed with threading in mind using items from the util.concurrent
> > > package to split, queue & work with data points.
> > >
> > > * Packages can be created as required & have data to process passed to
> > > them for processing in a stateless manner; much in the same way servlet
> > > engines and a lot of web frameworks run
> > >
> > > The first way does mean we can support environments with useful
> > > multi-threaded support (no point in threading on a single CPU/core box) from
> > > the word go. The second way would require some plumbing on the user's behalf
> > > but this would be very easy plumbing; the majority of which we could write
> > > (like wrapping things in instances of Callables).
> > >
> > > Anyway my 2p worth :)
> > >
> > > Andy
> > >
> > > Mark Schreiber wrote:
> > >
> > > > Hi -
> > > > I was just playing with threads to see how efficient they are on one
> > > > of our old 4 CPU IBM servers.  The following fairly naive program splits a
> > > > large array of numbers and sums them all up.  The multi-threaded version is
> > > > 2.5 times faster even allowing for thread overhead. The program could be
> > > > even better if I make more use of the java1.5 concurrent package.
> > > > Similar tasks in biojava would be include training distributions
> > > > which should see similar performance improvements. Much of the current
> > > > biojava doesn't make use of threads and worse, requires the developer to
> > > > manage all the thread safety themselves.
> > > > - Mark
> > > > /*
> > > >  * To change this template, choose Tools | Templates
> > > >  * and open the template in the editor.
> > > >  */
> > > > package concurrent;
> > > > import java.util.concurrent.atomic.AtomicInteger;
> > > > /**
> > > >  * This program demo's the use of threads to sum a large array of
> > > > integers.
> > > >  * @author Mark Schreiber
> > > >  */
> > > > public class ThreadedAdder {
> > > >    static int processors =
> > > > Runtime.getRuntime().availableProcessors();
> > > >    int bigNumber = 10000000;
> > > >    int[] bigArray = new int[bigNumber * processors];
> > > >        public ThreadedAdder(){
> > > >        //make a big array of integers (10 000 000 numbers for each
> > > > processor)
> > > >        for(int i = 0; i < bigArray.length; i++){
> > > >            //random number between 1 and 100
> > > >            bigArray[i] = (int)(Math.random() * 100.0);
> > > >        }
> > > >    }
> > > >    public void singleThreadedAdd(){
> > > >        int result = 0;
> > > >              //single threaded sum
> > > >        long start = System.currentTimeMillis();
> > > >        for(int number : bigArray){
> > > >            result += number;
> > > >        }
> > > >        long time = System.currentTimeMillis() - start;
> > > >        System.out.println("Calculation time = "+time+" ms");
> > > >        System.out.println("total = "+result);
> > > >            }
> > > >        public void multiThreadedAdd() throws InterruptedException{
> > > >        AtomicInteger total = new AtomicInteger();
> > > >        long start = System.currentTimeMillis();
> > > >        AddingThread[] threads = new AddingThread[processors];
> > > >        for(int i = 0; i < threads.length; i++){
> > > >            threads[i] = new AddingThread("Thread "+i, i * bigNumber,
> > > > total);
> > > >            System.out.println(threads[i].getName()+" starting");
> > > >            threads[i].start();
> > > >        }
> > > >        for(Thread thread : threads){
> > > >            //make sure everyone is finished
> > > >            thread.join();
> > > >        }
> > > >        long time = System.currentTimeMillis() - start;
> > > >        System.out.println("Calculation time = "+time+" ms");
> > > >        System.out.println("total = "+total);
> > > >    }
> > > >        /**
> > > >     * @param args the command line arguments
> > > >     */
> > > >    public static void main(String[] args) throws Exception{
> > > >        //how many processors do I have?
> > > >        System.out.println("Available processors = "+processors);
> > > >        System.out.println("Initializing number array");
> > > >        ThreadedAdder adder = new ThreadedAdder();
> > > >                System.out.println("single thread add");
> > > >        adder.singleThreadedAdd();
> > > >        System.out.println("multi thread add");
> > > >        adder.multiThreadedAdd();
> > > >    }
> > > >    public class AddingThread extends Thread{
> > > >        int internalTotal = 0;
> > > >        int offSet = 0;
> > > >        AtomicInteger callBackTotal;
> > > >                public AddingThread(String name, int offSet,
> > > > AtomicInteger callBackTotal){
> > > >            super(name);
> > > >            this.offSet = offSet;
> > > >            this.callBackTotal = callBackTotal;
> > > >        }
> > > >                @Override
> > > >        public void run(){
> > > >            for(int i = offSet; i < offSet + bigNumber; i++){
> > > >                internalTotal += bigArray[i];
> > > >            }
> > > >            callBackTotal.addAndGet(internalTotal);
> > > >            System.out.println(this.getName()+" complete");
> > > >        }
> > > >    }
> > > > }
> > > >
> > >
> > -----------------------------------------------------------------------
> >
> > Andreas Prlic      Wellcome Trust Sanger Institute
> >                              Hinxton, Cambridge CB10 1SA, UK
> >                              +44 (0) 1223 49 6891
> >
> > -----------------------------------------------------------------------
> >
> >
> >
> >
> >



More information about the biojava-dev mailing list