[Biojava-dev] Why BJ3 should be multithreaded

Andy Yates ayates at ebi.ac.uk
Thu Apr 10 08:36:41 UTC 2008


All of that looks very reasonable to me; I really should get round to 
reading that book soon :). The only thing that worries me about the 
constructor copy is object churn but as far as I'm aware that is a worry 
from the older days of Java & doesn't hold up with the later VMs.

It seems as we have two use-cases for concurrency in the 'newer' biojava:

* Using concurrency to speed up a process which is not CPU limited & is 
part of the core API

* Using concurrency to speed up a process which is CPU limited but can 
be sped up on machines with more that one core

Each scenario needs a different way of 'triggering' the concurrency. The 
first as people have said some kind of System property might be a good 
way to either enable multiple threads or disable it completely; this 
also needs to be designed with good concurrent practice in mind from the 
start. The second way is by user intention i.e. they use the 
multi-threaded pyhlogenetics package.

Does that sound okay?

Andy

Michael Heuer wrote:
> On Wed, 9 Apr 2008, Andy Yates wrote:
> 
>> That is an interesting bit of usage. You could queue the events out from
>> the feature builders into the thread/callable which constructs the final
>> Sequence object quite easily. Yeah very very true :)
>>
>> The majority of objects are mutable in BJ I think. I'm not saying this
>> is a bad thing nor suggesting everything needs to be immutable :). It's
>> more about making sure only one thread is working on one object at a
>> given point in the program. If there are going to be mutable objects
>> hanging around then Queues are probably the best way to work with them.
> 
> I am going to crib directly from the book I think Mark was referring to
> earlier:
> 
>  - It's the mutable state, stupid
> 
>   All concurrency issues boil down to coordinating access to mutable
> state.  The less mutable state, the easier it is to ensure thread safety.
> 
>  - Make fields final unless they need to be mutable
> 
>  - Immutable objects are automatically thread-safe
> 
>   Immutable objects simplify concurrent programming tremendously.  They
> are simper and safer, and can be shared freely without locking or
> defensive copying.
> 
> "Java Concurrency in Practice", Goetz et al., 2006, p110.
> http://www.javaconcurrencyinpractice.com/
> 
> 
> The Immutable with Copy Mutators pattern provides "setter"-like methods
> that return copies of the immutable object:
> 
>   /**
>    * Return a copy of this foo with the bar set to <code>bar</code>.
>    *
>    * <p>Foo is immutable, so there are no set methods.  Instead, this
>    * method returns a new instance of Foo copied from <code>this</code>
>    * with the value of bar changed.</p>
>    *
>    * @param bar bar for the copy of this foo
>    * @return a copy of this fo with the bar set to <code>bar</code>
>    */
>   public Foo withBar(final Bar bar)
>   {
>     Foo copy = new Foo(..., bar);
>     return copy;
>   }
> 
> This is used in JodaTime, JSR-310, and elsewhere.  I have a template I use
> to generate classes in this style at
> 
> http://tinyurl.com/6n2nhp
> 
> 
>>> Mark Schreiber wrote:
>>> One area where you could get an interesting mixture of stateless and
>>> synchronized access to a mutable would be threaded parsing of large
>>> sequence files.  In my experience the BioJava parsers are not
>>> normally I/O bound due to all the object building they do.  Given
>>> this a filereader could for example read a feature block and hand it
>>> off to a threaded stateless feature handler which produces a Feature
>>> object and then adds it (synchronized) to the BioJava Sequence that
>>> is being built. As long as I/O doesn't limit then you would get
>>> improved parsing performance.  It would also be a case where the
>>> threading should happen internally as it could be pretty hard to
>>> coordinate the process from the outside.
>>>
>>> This also highlights the difference between encapsulation and
>>> immutability. Even if access to variables is controlled by package
>>> and protected setters the class is still mutable (but not by the
>>> user). Immutability can only be achieved by not providing any setter
>>> methods which has obvious severe limitations.  Currently BioJava
>>> Sequence objects have restricted mutability (use of Edit objects) but
>>> are certainly not immutable.
>>>
>>> Again messages need not be immutable as long as they have appropriate
>>>  locks and or synchronized getters and setters.  Many java frameworks
>>>  work best when messages or DTO's are beans (with parameterless
>>> constructors and public getters and setters), being able to use these
>>>  is often very desirable. These beans can still be threadsafe if you
>>> code them right.
> 
> What might that look like?
> 
> I have to think in most cases (DTOs, form beans, etc) are safe only
> because the container is managing the lifecycle of those beans.
> 
> 
> Perhaps we might want to copy some of this discussion to
> 
> http://biojava.org/wiki/Talk:BioJava3_Design
> 
> or a new page about concurrency issues when we are finished.
> 
>    michael



More information about the biojava-dev mailing list