[Biojava-dev] Why BJ3 should be multithreaded

Thu Apr 10 11:40:44 UTC 2008

> * Using concurrency to speed up a process which is not CPU limited & is part
> of the core API
>

Do you have a specific example in mind? Something blocking that needs
to be non-blocking? The parseing example could be one (as i/o blocks
during parsing) but I think it actually might be CPU limited as well.

> * Using concurrency to speed up a process which is CPU limited but can be
> sped up on machines with more that one core
>

Yes. Seems almost ever modern machine is dual core nowadays, we should
take advantage of this.

> Each scenario needs a different way of 'triggering' the concurrency. The
> first as people have said some kind of System property might be a good way
> to either enable multiple threads or disable it completely; this also needs
> to be designed with good concurrent practice in mind from the start. The

It would be good to make it configurable via the presence of a
properties file or similar. Default could be to use all available
processors, which can be determined from the Runtime object. This
approach would let users control how much of their machines grunt is
used for heavy lifting.

This approach would also allow users to test and tune for any
installation. In recent tests I have noticed that a task has to be
reasonably expensive to be worth spawning more threads (to get a
quicker run time). The definition of expensive really depends on the
machine. One task on an old linux 4 CPU machine got a 2 fold speed up
by using all CPUs. The exact same task on a new dual core laptop
actually slowed down as the thread spawning was slower than the
calculation. A much harder calculation on this machine did improve
with threading.  Control of this via a property would let you set the
appropriate strategy on any deployment.

> second way is by user intention i.e. they use the multi-threaded
> pyhlogenetics package.
>

Some packages should be threaded even if there is only one processor
to prevent blocking. For example parsing should spawn at least one
thread that is seperate from the i/o thread even on a single CPU
system. Much as swing is threaded to prevent GUI blocking.

- Mark

> Does that sound okay?
>
> Andy
>
>
>
> Michael Heuer wrote:
> > On Wed, 9 Apr 2008, Andy Yates wrote:
> >
> >
> > > That is an interesting bit of usage. You could queue the events out from
> > > the feature builders into the thread/callable which constructs the final
> > > Sequence object quite easily. Yeah very very true :)
> > >
> > > The majority of objects are mutable in BJ I think. I'm not saying this
> > > is a bad thing nor suggesting everything needs to be immutable :). It's
> > > more about making sure only one thread is working on one object at a
> > > given point in the program. If there are going to be mutable objects
> > > hanging around then Queues are probably the best way to work with them.
> > >
> >
> > I am going to crib directly from the book I think Mark was referring to
> > earlier:
> >
> >  - It's the mutable state, stupid
> >
> >  All concurrency issues boil down to coordinating access to mutable
> > state.  The less mutable state, the easier it is to ensure thread safety.
> >
> >  - Make fields final unless they need to be mutable
> >
> >  - Immutable objects are automatically thread-safe
> >
> >  Immutable objects simplify concurrent programming tremendously.  They
> > are simper and safer, and can be shared freely without locking or
> > defensive copying.
> >
> > "Java Concurrency in Practice", Goetz et al., 2006, p110.
> > http://www.javaconcurrencyinpractice.com/
> >
> >
> > The Immutable with Copy Mutators pattern provides "setter"-like methods
> > that return copies of the immutable object:
> >
> >  /**
> >   * Return a copy of this foo with the bar set to <code>bar</code>.
> >   *
> >   * <p>Foo is immutable, so there are no set methods.  Instead, this
> >   * method returns a new instance of Foo copied from <code>this</code>
> >   * with the value of bar changed.</p>
> >   *
> >   * @param bar bar for the copy of this foo
> >   * @return a copy of this fo with the bar set to <code>bar</code>
> >   */
> >  public Foo withBar(final Bar bar)
> >  {
> >    Foo copy = new Foo(..., bar);
> >    return copy;
> >  }
> >
> > This is used in JodaTime, JSR-310, and elsewhere.  I have a template I use
> > to generate classes in this style at
> >
> > http://tinyurl.com/6n2nhp
> >
> >
> >
> > >
> > > > Mark Schreiber wrote:
> > > > One area where you could get an interesting mixture of stateless and
> > > > synchronized access to a mutable would be threaded parsing of large
> > > > sequence files.  In my experience the BioJava parsers are not
> > > > normally I/O bound due to all the object building they do.  Given
> > > > this a filereader could for example read a feature block and hand it
> > > > off to a threaded stateless feature handler which produces a Feature
> > > > object and then adds it (synchronized) to the BioJava Sequence that
> > > > is being built. As long as I/O doesn't limit then you would get
> > > > improved parsing performance.  It would also be a case where the
> > > > threading should happen internally as it could be pretty hard to
> > > > coordinate the process from the outside.
> > > >
> > > > This also highlights the difference between encapsulation and
> > > > immutability. Even if access to variables is controlled by package
> > > > and protected setters the class is still mutable (but not by the
> > > > user). Immutability can only be achieved by not providing any setter
> > > > methods which has obvious severe limitations.  Currently BioJava
> > > > Sequence objects have restricted mutability (use of Edit objects) but
> > > > are certainly not immutable.
> > > >
> > > > Again messages need not be immutable as long as they have appropriate
> > > >  locks and or synchronized getters and setters.  Many java frameworks
> > > >  work best when messages or DTO's are beans (with parameterless
> > > > constructors and public getters and setters), being able to use these
> > > >  is often very desirable. These beans can still be threadsafe if you
> > > > code them right.
> > > >
> > >
> >
> > What might that look like?
> >
> > I have to think in most cases (DTOs, form beans, etc) are safe only
> > because the container is managing the lifecycle of those beans.
> >
> >
> > Perhaps we might want to copy some of this discussion to
> >
> > http://biojava.org/wiki/Talk:BioJava3_Design
> >
> > or a new page about concurrency issues when we are finished.
> >
> >   michael
> >
>