[Biojava-l] library for running blast and formatdb]

Patrick McConnell MCCon012@mc.duke.edu
Tue, 14 Jan 2003 14:28:57 -0500


Thanks for all of the comments/suggestions.  There appear to be alot of
libraries out there that deal with parameterization, configuration,
launching external programs, etc.  And, they all appear to do it somewhat
differently.

I have completed more work on my lilbrary for launching applications.  It
uses Program and Parameters objects like I described before, and I have
added a Queue interface and ProgramQueue implementation that performs basic
queuing functions.  I have also implemented an NCBIBlastQueue class and
have successfully tested it as a web service.  Finally, I have also
implemented a QueueQueue class that exposes the Queue interface and runs
jobs through a list of Queues.  Thus, one could link queues together.  I
have not tested this part.

Forwarded below is another stab at the problem that EBI has made.  After
just glancing at the docs, it looks pretty good.  And, the developer on the
project (Martin Senger) appears dedicated to moving his code to BioPerl and
BioJava.  So, that sounds good to me.  Martin: please take a look at my
implementation and see if there is anything that you like : - )

I am sure it is not the cleanest of implementations, and I am open to
suggestions/comments.  If BioJava decides to adopt something different for
it's program infrastructure, then feel free to take anything from my code.

See more docs/examples at: http://www.dbsr.duke.edu/software/blast

-Patrick


---------------------- Forwarded by Patrick McConnell/CanCtr/mc/Duke on
01/14/2003 01:58 PM ---------------------------


Martin Senger <senger@ebi.ac.uk> on 01/14/2003 09:35:04 AM

To:    Patrick McConnell <MCCon012@mc.duke.edu>
cc:    Tom Oinn <tmo@ebi.ac.uk>, Alan Robinson <alan@ebi.ac.uk>

Subject:    RE: [Biojava-l] library for running blast and formatdb]

Hi Patrick,
   [ I am not (always) on the BioJava mailing-list, but I have got your
email from my colleque Tom Oinn. If you feel appropriate you may forward
this answer to the mailing list. Or wouold you, Tom? Thanks.]

   I have written a Java-based project (but not purely, it uses also Perl
launchers) dealing with starting/running/controlling extrenal processes
(such the EMBOSS). It has advantages and disadvanatges.

   Advantages are:
      - It's around for about five years now and it's quite stable.
      - The interface (functionality ) is based on an approved standard
(OMG). This includes both the methods for calling the external processes
and the XML DTD for describing command-line programs in (very) details. It
also includes generators to create such XML files from proprietary
metadata of some known packages (GCG, EMBOSS).

   Disadvantages are:
      - It uses CORBA - not everybody is happy with it. A slight remedy
is that I have created additional interface used as a WebService which may
be used instead of CORBA (even though the current implementation still
uses CORBA under the hood).
      - It has dependency on some general Java classes which I wrote
myself and useing them in all my projects. These tools are openly
available but it makes life more difficult if I wish to put the code into
BioJava.

   My current plans which I hope to finish during this year BioHackaton
(end of February) are:
   - To make another Java implementation without dependencies on CORBA
(but using roughly the same API). To do the same for Perl. Both these
implementations will behave the same (will have the same interface)
regardless if the extrenal process is:
   - a local program,
   - a remote program accessible via a CORBA server, or
   - a remote accessible via a WebSewrvice.
   I hope that the results will be accepted as contributions both to
BioPerl and BioJava (therefore I want to finalize it during the
BioHackaton where all key developers of BioJava and BioPerl will be
present)

   The relevant URLs are:
   http://industry.ebi.ac.uk/applab   (CORBA implementation)
   http://industry.ebi.ac.uk/soaplab  (Web Service implementation)
   http://industry.ebi.ac.uk/~senger/tools (general Java tools,
      particularly class embl.ebi.tools.Executor is interesting for this
      thread).

   Regards,
   Martin

>
>
> -------- Original Message --------
> Subject: RE: [Biojava-l] library for running blast and formatdb
> Date: Tue, 14 Jan 2003 08:32:36 -0500
> From: "Patrick McConnell" <MCCon012@mc.duke.edu>
> To: <biojava-l@biojava.org>
>
>
> What I have written provides two essential base classes: Program and
> Parameters.  The Program class provides the functionality for launching a
> program and capturing output.  I should also put in hooks for handling
the
> input and output as streams as an alternative to capturing it in memory.
> The Parameters class builds command arguments based on the fields of the
> extending class using reflection.  It provides some flexibility for
> determining what the flags and delimitters look like.  There has been
> discussion to change the implementation somewhat to use jakarta's CLI
> library, and I think a hybrid of the two would be appropriate.
>
> I have written Program and Parameters implementations for NCBI's blastall
> and formatdb programs.  Now, after chatting with Jason Stajich here at
> Duke, I am working on a flexible queueing system for Programs.  This code
> isn't complete yet, though.
>
> So, if everyone likes this framework for launching programs, I'd be glad
to
> donate it to BioJava.  If people don't like it, I'll change it based on
> suggestions.  Whomever is interested, please check out:
> http://www.dbsr.duke.edu/software/blast . My code is fully documented,
and
> I have added a couple examples that demonstrate the ease of launching
> blast.
>
> As to the XML description of program parameters, I think that is a good
> idea, and can be a factory method in my Parameters class.  The method
takes
> in the XML somehow (File or Stream or whatever) and returns a Parameters
> object.  But, I know that some people would prefer to handle the
Parameters
> internally with code instead of externally in a File.  So, we should not
> limit ourselves to a single approach.
>
> Thanks!
>
> -Patrick
>
>
>
>
>
> "Schreiber, Mark" <mark.schreiber@agresearch.co.nz>@biojava.org on
> 01/13/2003 03:08:08 PM
>
> Sent by:    biojava-l-admin@biojava.org
>
>
> To:    "Patrick McConnell" <MCCon012@mc.duke.edu>
> cc:    <biojava-l@biojava.org>
>
> Subject:    RE: [Biojava-l] library for running blast and formatdb
>
> One thing sorely missing from BioJava is the ability to launch and
> capture the results of common bioinformatics programs. I know Java isn't
> the best at this but it's not that bad. It's also needed if you want to
> develop pipeline type applications.
>
> Would it be possible to get some kind of over-arching interface based
> API so that services can be made available with similar interfaces.
>
> Possibly a Service or Program interface a Paramater list or map, some
> kind of result stream?
>
> Just my $0.02
>
> - Mark
>
>  > -----Original Message-----
>  > From: Patrick McConnell [mailto:MCCon012@mc.duke.edu]
>  > Sent: Tuesday, 14 January 2003 4:15 a.m.
>  > To: biojava-l@biojava.org
>  > Subject: Re: [Biojava-l] library for running blast and formatdb
>  >
>  >
>  >
>  >
>  > >I suppose it's a matter of another external dependency vs.
>  > reinvented
>  > >utility code in biojava . . .  Would it make sense to merge
>  > the better
>  > >qualities of the two?
>  >
>  > The CLI project looks like it is quite flexible and robust.
>  > But, with this, it is somewhat complex.  This is in contrast
>  > to the simplicity of creating parameters via reflection.  I
>  > think that these two methods could be effectively combined so
>  > that we gain the simplicty of reflection with the flexibility
>  > of CLI.  The base parameters class can use CLI to build its
>  > parameters.  As an option, it can build CLI options via
>  > reflection for simplicity.  When users extend the base class,
>  > they can utilize the flexibility of CLI if they need it,
>  > otherwise they can use reflection for a quick and dirty
>  > parameter parsing.  The base class could even extend the
>  > Options class, so we are really working with a hybrid of the
>  > two.  What does everyone think?
>  >
>  > -Patrick
>  >
>  >
>  >
>  >
>  >
>  >
>  > "Michael L. Heuer" <heuermh@acm.org>@shell3.shore.net> on
>  > 01/10/2003 05:18:52 PM
>  >
>  > Sent by:    Michael Heuer <heuermh@shell3.shore.net>
>  >
>  >
>  > To:    Patrick McConnell <MCCon012@mc.duke.edu>
>  > cc:    biojava-l@biojava.org
>  >
>  > Subject:    Re: [Biojava-l] library for running blast and formatdb
>  >
>  >
>  > On Fri, 10 Jan 2003, Patrick McConnell wrote:
>  >
>  > > In the process, I developed some useful and flexible base
>  > classes for
>  > > formatting parameters and running programs.  Parameters are
>  > > automatically converted to an argument array via reflection and
>  > > reading of standard out and standard error in separate threads is
>  > > handled automatically.
>  >
>  > The base classes are nice, but I prefer the design of
>  >
>  > > http://jakarta.apache.org/commons/cli
>  >
>  > a lot better for handling parameters.
>  >
>  > I suppose it's a matter of another external dependency vs.
>  > reinvented utility code in biojava . . .  Would it make sense
>  > to merge the better qualities of the two?
>  >
>  > I also have a few simple classes for oneoff scripts with
>  > command line & logging facade support that I use all the time, see
>  >
>  > > http://www.shore.net/~heuermh/oneoff.tar.gz
>  >
>  > but they don't have any extra support for external programs.
>  >
>  >    michael
>  >
>  > >
>  > > Check it out if you are interested:
>  > > http://www.dbsr.duke.edu/software/blast/default.htm .  The full
>  > > source, javadocs, and binary class files are available.
>  > Also, if this
>  > > seems appropriate for BioJava, I have no problem donating it to the
>  > > cause.  I think that at least the base classes, or some
>  > modification
>  > > of them, would be useful to others.
>  > >
>  > > Please email me with suggestions/comments,
>  > >
>  > > -Patrick McConnell
>  > > Duke Bioinformatics Shared Resource
>  > > mccon012@mc.duke.edu
>  > >
>  > >
>  > > _______________________________________________
>  > > Biojava-l mailing list  -  Biojava-l@biojava.org
>  > > http://biojava.org/mailman/listinfo/biojava-l
>  > >
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  > _______________________________________________
>  > Biojava-l mailing list  -  Biojava-l@biojava.org
>  > http://biojava.org/mailman/listinfo/biojava-l
>  >
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
>
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>

--
Martin Senger

EMBL Outstation - Hinxton                Senger@EBI.ac.uk
European Bioinformatics Institute        Phone: (+44) 1223 494636
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger