[Biojava-dev] Plans for next biojava release - modularization

mark.schreiber at novartis.com mark.schreiber at novartis.com
Tue May 12 05:26:33 UTC 2009


Hi -

This was one thing we discussed previously with respect to biojava 3. 
Generally I support the idea because almost all computers are now 
multi-core and as you say cloud or utility computing is already a reality.

However, I tend to think that biojava should not control threading or 
concurrency. This should be done by the developer. This is because 
sometimes mutithreading can be fast on a slow computer but slow on a fast 
computer (due to the overhead in spawning threads) so programs need to be 
tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc 
don't like people attempting to control their own threads.  What BioJava 
should do is expose granular and thread-safe operations that can be 
threaded or form discrete tasks on a utility grid or complete in 
SessionBeans on an App server.  For example it would be better if BioJava 
had a single threaded method to calculate the GC of a single sequence 
rather than a multi-threaded method that calculates the GC of multiple 
sequences.  This would let the developer make a multithreaded version if 
desired or distribute multiple tasks based on the single threaded version 
to a compute cloud (and let the cloud manage all the tasks).

Possibly the best situation would be to have the single threaded fine 
grain operations that let developers or grid engines control threading and 
then higher level APIs that do it for you (or good cookbook examples that 
show you how to do it).  Another idea that was discussed was the use of 
properties files to allow people to set how many CPUs they wanted to make 
available to the JVM or name packages that can or cannot use threading.

Finally, there are lots of times when it is highly desirable to use Java 
beans because they play well with dozens of Java api's however beans don't 
work well with threads because they have public setter methods.  I would 
like to see a lot more bean use in a future BioJava because it would make 
life so much easier but a lot of care would need to be taken to make sure 
thread safety is preserved.  There are many patterns that can be used such 
as synchronization locks etc to make things thread safe so I think this 
can be achieved as long as we are disciplined and consider that all 
methods may be used in a multi-threaded application (even if we write the 
method as a single thread).  If there are code checkers that make 
suggestions on thread safety it would be great to have these as part of 
the standard build process.  Good documentation would go a long way as 
well.  Are there unit test patterns that can catch these problems as well? 
 Suggestions would be great.

Progress Listener patterns are good but it depends on the situation and 
might be better handled in high level APIs or left to the developer.  For 
example in your NJ code a progress listener would be good if someone fed 
1000 sequences into the method but not if they only put in 10. Also code 
running on an old machine might need a progress listener but the same 
problem on a new machine may complete almost instantly.  Probably a 
pluggable listener would be the way to go.  Also it might be possible to 
do this using the new JDK APIs that let you take a peek at the stack 
trace. Even if your NJ method didn't allow for a progress listener a 
developer could still make one by looking at the method calls in the 
stack. As long as your NJ method called other methods internally for each 
sequence (quite likely) it would be possible to observe the cycle of 
method calls from the stack.  This might make it possible to have a very 
general BioJava progress listener that can be told to count the number of 
times a method is called in the stack. The name of the method would be the 
argument.  If the application runs in a Java App server you can also do 
this very easily with a method Interceptor.

- Mark

biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:

> Andreas
> 
> Another theme that should be considered is providing a multi-thread
> version of any module with long run time. This would have a couple
> elements. A progress listener interface should be standard where core
> code would update progress messages to listeners that can be used by
> external code to display feedback to the user. I did this with the
> Neighbor Joining code for tree construction and it provides needed
> feedback in a GUI. If not the user gets frustrated because they don't
> know the code they are about to execute may take 10 minutes or 8 hours
> to complete and they think the software is not working. The reverse is
> also true for canceling an operation where you want to have core code
> stop processing a long running loop. Once the code has completed then
> the listener interface for process complete is called allowing the next
> step in the external code to continue. The developer would have the
> choice to call the "process" method or run it in a thread and wait for
> the callback complete method to be called. 
> 
> This is the first step in the ability to have the core/long running
> processes take advantage of multiple threads to complete the
> computational task faster. Not all code can be parallelized easily but
> if the algorithm can take advantage of running in parallel then it
> should. This then opens up a couple of cloud computing frameworks that
> extend the multi-threaded concepts in Java across a cluster
> http://www.terracotta.org/. If we put an emphasis on having code that
> runs well in a thread we are one step closer to an architecture that can
> run in a cloud. The computational problems are only going to get bigger
> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> computational IO cycles are going to be cheap as long as the
> software/libraries can easily take advantage of it.
> 
> Thanks
> 
> Scooter
> 
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> Prlic
> Sent: Monday, May 11, 2009 12:27 AM
> To: biojava-dev
> Subject: [Biojava-dev] Plans for next biojava release - modularization
> 
> Hi biojava-devs,
> 
> It is time to start working on the next biojava release.  I  would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release.  Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
> 
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
> 
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
> 
> Let me know what you think about this.
> 
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.



More information about the biojava-dev mailing list