[Biojava-dev] Plans for next biojava release - modularization
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Tue May 12 05:26:33 UTC 2009
Hi -
This was one thing we discussed previously with respect to biojava 3.
Generally I support the idea because almost all computers are now
multi-core and as you say cloud or utility computing is already a reality.
However, I tend to think that biojava should not control threading or
concurrency. This should be done by the developer. This is because
sometimes mutithreading can be fast on a slow computer but slow on a fast
computer (due to the overhead in spawning threads) so programs need to be
tunable. Also Java app servers and things like Sun Grid Engine, EC2 etc
don't like people attempting to control their own threads. What BioJava
should do is expose granular and thread-safe operations that can be
threaded or form discrete tasks on a utility grid or complete in
SessionBeans on an App server. For example it would be better if BioJava
had a single threaded method to calculate the GC of a single sequence
rather than a multi-threaded method that calculates the GC of multiple
sequences. This would let the developer make a multithreaded version if
desired or distribute multiple tasks based on the single threaded version
to a compute cloud (and let the cloud manage all the tasks).
Possibly the best situation would be to have the single threaded fine
grain operations that let developers or grid engines control threading and
then higher level APIs that do it for you (or good cookbook examples that
show you how to do it). Another idea that was discussed was the use of
properties files to allow people to set how many CPUs they wanted to make
available to the JVM or name packages that can or cannot use threading.
Finally, there are lots of times when it is highly desirable to use Java
beans because they play well with dozens of Java api's however beans don't
work well with threads because they have public setter methods. I would
like to see a lot more bean use in a future BioJava because it would make
life so much easier but a lot of care would need to be taken to make sure
thread safety is preserved. There are many patterns that can be used such
as synchronization locks etc to make things thread safe so I think this
can be achieved as long as we are disciplined and consider that all
methods may be used in a multi-threaded application (even if we write the
method as a single thread). If there are code checkers that make
suggestions on thread safety it would be great to have these as part of
the standard build process. Good documentation would go a long way as
well. Are there unit test patterns that can catch these problems as well?
Suggestions would be great.
Progress Listener patterns are good but it depends on the situation and
might be better handled in high level APIs or left to the developer. For
example in your NJ code a progress listener would be good if someone fed
1000 sequences into the method but not if they only put in 10. Also code
running on an old machine might need a progress listener but the same
problem on a new machine may complete almost instantly. Probably a
pluggable listener would be the way to go. Also it might be possible to
do this using the new JDK APIs that let you take a peek at the stack
trace. Even if your NJ method didn't allow for a progress listener a
developer could still make one by looking at the method calls in the
stack. As long as your NJ method called other methods internally for each
sequence (quite likely) it would be possible to observe the cycle of
method calls from the stack. This might make it possible to have a very
general BioJava progress listener that can be told to count the number of
times a method is called in the stack. The name of the method would be the
argument. If the application runs in a Java App server you can also do
this very easily with a method Interceptor.
- Mark
biojava-dev-bounces at lists.open-bio.org wrote on 05/11/2009 09:50:58 PM:
> Andreas
>
> Another theme that should be considered is providing a multi-thread
> version of any module with long run time. This would have a couple
> elements. A progress listener interface should be standard where core
> code would update progress messages to listeners that can be used by
> external code to display feedback to the user. I did this with the
> Neighbor Joining code for tree construction and it provides needed
> feedback in a GUI. If not the user gets frustrated because they don't
> know the code they are about to execute may take 10 minutes or 8 hours
> to complete and they think the software is not working. The reverse is
> also true for canceling an operation where you want to have core code
> stop processing a long running loop. Once the code has completed then
> the listener interface for process complete is called allowing the next
> step in the external code to continue. The developer would have the
> choice to call the "process" method or run it in a thread and wait for
> the callback complete method to be called.
>
> This is the first step in the ability to have the core/long running
> processes take advantage of multiple threads to complete the
> computational task faster. Not all code can be parallelized easily but
> if the algorithm can take advantage of running in parallel then it
> should. This then opens up a couple of cloud computing frameworks that
> extend the multi-threaded concepts in Java across a cluster
> http://www.terracotta.org/. If we put an emphasis on having code that
> runs well in a thread we are one step closer to an architecture that can
> run in a cloud. The computational problems are only going to get bigger
> and with Amazon EC2 and http://www.eucalyptus.com/ approaches
> computational IO cycles are going to be cheap as long as the
> software/libraries can easily take advantage of it.
>
> Thanks
>
> Scooter
>
> -----Original Message-----
> From: biojava-dev-bounces at lists.open-bio.org
> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas
> Prlic
> Sent: Monday, May 11, 2009 12:27 AM
> To: biojava-dev
> Subject: [Biojava-dev] Plans for next biojava release - modularization
>
> Hi biojava-devs,
>
> It is time to start working on the next biojava release. I would
> like to modularize the current code base and apply some of the ideas
> that have emerged around Richard's "biojava 3" code. In principle the
> idea is that all changes should be backwards compatible with the
> interfaces provided by the current biojava 1.7 release. Backwards
> compatibility shall only be broken if the functionality is being
> replaced with something that works better, and gets documented
> accordingly. For the build functionality I would suggest to stick with
> what Richard's biojava 3 code base already is providing. Since we will
> try to be backwards compatible all code development should be part of
> the biojava-trunk and the first step will be to move the ant-build
> scripts to a maven build process. Following this procedure will allow
> to use e.g. the code refactoring tools provided by Eclipse, which
> should come in handy.
>
> The modules I would like to see should provide self-contained
> functionality and cross dependencies should be restricted to a
> minimum. I would suggest to have the following modules:
>
> biojava-core: Contains everything that can not easily be modularized
> or nobody volunteers to become a module maintainer.
> biojava-phylogeny: Scooter expressed some interested to provide such a
> module and become package maintainer for it.
> biojava-structure: Everything protein structure related. I would be
> package maintainer.
> biojava-blast: Blast parsing is a frequently requested functionality
> and it would be good to have this code self-contained. A package
> maintainer for this still will need to be nominated at a later stage.
> Any suggestions for other modules?
>
> Let me know what you think about this.
>
> Andreas
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
_________________________
CONFIDENTIALITY NOTICE
The information contained in this e-mail message is intended only for the
exclusive use of the individual or entity named above and may contain
information that is privileged, confidential or exempt from disclosure
under applicable law. If the reader of this message is not the intended
recipient, or the employee or agent responsible for delivery of the
message to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication is strictly
prohibited. If you have received this communication in error, please
notify the sender immediately by e-mail and delete the material from any
computer. Thank you.
More information about the biojava-dev
mailing list