[Biojava-dev] The future of BioJava

Sat Sep 22 18:42:50 UTC 2007

Richard & Andy,

   1. I like the idea of making readers more pluggable, and Dozer
   definitely looks interesting.  Is this going to be supported via the Service
   Provider Interface approach (used by Taverna and other projects)?

   2. Andy brought up the point of people who create non-standard
   variations of EMBL-formatted files.  I was wondering if these files were
   created in programming languages other than Java?  If so, would those users
   be willing to use a Jython, JRuby, or a Perl-like scripting language like
   Sleep,?  This would allow them to use biojava as a library, and still use a
   scripting language whose syntax they were familiar with.  They would also be
   producing files in a more standardized format.  This might cut down on the
   number of parsing mistakes caused by "unsupported" file variations.  You can
   go to http://scripting.dev.java.net for more information on the
   scripting languages that the Java VM supports.

   3. Was there any reason why non-standard files were being created?
   Perhaps some use-case not being covered?

   4. If BioJava is split up into a variety of smaller JARs, how would
   you insure that the users had all of the JARs that they needed?  Would an
   installer be provided to allow users to select groups of JARs?  There are a
   number of open source installers that would make this process easier.  Using
   Maven is suitable if you're a developer, if you're a scripter it's a little
   more difficult to deal with.

   5. Are there any thoughts about using a templating system like
   Velocity, FreeMarker or JST?  This would make it easier to insure that files
   were produced in a standard fashion.  It would also make it easier to
   maintain support for writing files in different file formats.

   6. When it comes to unit testing and continuous building, is the
   bio*.org server going to handle that automated build & burn, or is someone
   in the group going to have to do it?  I think the inability to have the
   build setup on the server had us stymied before.

   7. Now that Java also includes the Derby database, and the Java
   Persistence API (JPA), has anyone considered migrating the BioSQL support
   from Hibernate to JPA, and using Derby as the default database?  This would
   make it a little easier to maintain and would minimize the setup work that a
   new user would have to do.

   8. Richard, you mention in the "Reasoning" section that "users have
   moved on".  What types of use-cases beyond basic sequence analysis, should
   BioJava support?  Would support for more of lab-related processes expand the
   user base and number of committers?  Would support for parsing different
   types of instrument files be a useful addition? I could imagine use cases
   where users would like to be able to parse an Affy file and fetch probe
   information, gene information, and perhaps pathway data.

   9. Are there any thoughts about using annotations (perhaps in
   combination with ontologies) to handle semantic validation of arguments?
   For example, you might have an annotation like

@id {ontologyURI="http://www.mygrid.org.uk/ontology#LocusLink_record_id"}

indicating that the attribute or method argument is a LocusLink id.

Thanks for kick-starting this discussion?

Regards,

Mark Fortner