[Biojava-l] BioInformatics toolbox.

Patrick McConnell MCCon012@mc.duke.edu
Thu, 11 Apr 2002 16:19:10 -0400


This is my first time writing to this list, so I think I should introduce
myself. My name is Patrick McConnell, and I am a scientific program at the
Duke Bioinformatics Shared Resource.  We work on a variety of things, but I
am currently working on a flexible java framework for high-throughput batch
computing.

In reference to the Bioinforamtics toolbox thread, I do not see complex
data types as an issue.  I have (for some time now), toyed with the idea of
setting up a GUI pipeline like this for bioinformatics web-services. The
way I see it is that one should be able to view the the complex format of
the output of one node (i.e. nested java interfaces or in my case an XML
schema), and draw pipes from particular parts of the data (like hsp
sequence in the blast example) to another node.  You could also install
filters between 'actions' that sort data based on a particular field and
criteria.  In addition to actions and data (actions and pipes), there needs
to be a special action that is an end-point that knows how to do something
intelligent with the data like display it or save it to a file.

Personally, I see web-services as an ideal environment for this sort of
toolbox because web services have well defined inputs and outputs (via a
WSDL file).  Has anyone heard of an application of this sort as applied to
web-services?  Assuming no one has (and I am able to find some free time),
I plan to pursue this idea.  Would defining java interfaces for
web-services fall under the biojava project?  What about implementations?
Has anyone attempted to define interfaces for complex data such as blast or
(even worse) Medline articles?

-Patrick

>A number of people have pointed out that several GUI's exist for
>connecting components into pipelines (and I'll add my own- the
>biojava-lims code that I've been working on) and that the existing
>bio{perl,java} classes could probably be extended or wrapped to fit into
>these frameworks.  But I don't think that the combination would yield a
>viable system to let users create their own programs.
>
>When you extend bio{perl.java} classes to get components for these
>GUI's, you'd end up with 2 types : data and actions.  Data components
>(like java beans) would be able to describe their properties.  Action
>components would need to describe the format/requirements for their
>inputs and outputs.  Action components would get their inputs from
>- the outputs of other action components
>- user inputs
>- parameters you specify for the program.
>
>The first problem is how to format the outputs of one action as inputs
>for another.  The bio*'s solve this by providing standard interfaces
>that everything uses.
>
>The second, and I think harder, problem is that you end up with too many
>types of data objects running around and too many types of actions.
>
>Consider a pipeline where you start with genbank accession numbers,
>fetch the sequences, blast the sequences against a local database, and
>do something with the sequences based on the output.  The first part is
>easy to specify.  Input is a list of strings, output is a Sequence
>object.  The sequence object goes to the blast component.  But having a
>GUI specify how to process the blast output is hard because there are
>lots of possibilities.  Trying to specify what should happen through a
>GUI seems like it would either be very confusing (eg a long list of
>options) or very limiting (a short list of options).
>
>I think the best way to start on a toolbox that user's could use is to
>build a toolbox for programmer's the provides useful components and a
>GUI.  Hopefully you have to write less and less code as time goes by to
>the point where users could design their own process without any coding.
>
>Alex