[BioRuby] Fwd: gsoc suggestion: microframework for simple scientific web wrappers

Joachim Baran joachim.baran at gmail.com
Fri Feb 7 14:37:02 UTC 2014


I like the proposed ideas very much. Especially Yannick’s proposal is particularly interesting in my opinion.

However, I think that none of these projects can be handled by a GSoC student. They are too complex projects — especially as they do not build upon existing architecture. It might be better to get funding for a full time software developer and bioinformatician to tackle these problems.

Joachim

On February 7, 2014 at 8:07:03 AM, Iain Barnett (iainspeed at gmail.com) wrote:

This is obviously a good idea, but allow me to continue to be a critical  
voice too so that at least there are known downsides too.  

There's https://rubygems.org/gems/sinatra-param for defining parameter  
contracts. Of course, if you're calling the CLI then it should already have  
those defined so it'd be duplicated effort, everything will be passed as a  
string from the GUI to the CLI anyway.  

I should say again, you'll be wrapping a command line *interface* (CLI)  
with a graphical user *inteface* (GUI) when, if the libraries are written  
in Ruby, you'll have access to the library's functions (at the very least  
through opening the classes), and if the library has separated concerns  
properly there will be no need to access the command line interface.  
Interfaces wrapping interfaces, in a list of possible ways to do this,  
should be one of the last. As a practical example, if you wanted to get  
data from Twitter, would you rather write a screen scraper, or access the  
data API?  

So, as an alternative, you could give a list of projects that you wished  
had a GUI. For each project:  

1. Check CLI and library are separated. If not, separate them. Not only  
does this make calling the library easier, it has the added bonus of  
improving that library and allowing others to develop against it more  
easily too. If someone wants to use it in their Rails project, they can.  
Different CLI, they can.  
2. Make a Sinatra Extension for that library. Preferably as a route that  
only supplies data, and (possibly) a separate route for viewing the data,  
or perhaps some javascript and a view to be used. This has the added bonus  
of allowing others to supply their own views or use the extension in their  
own projects too.  

That would improve infrastructure a lot. There are lots of things you could  
do around this, perhaps provide templates for CLI + libraries and/or  
building Sinatra extensions, write articles on how you moved one library  
from a monolithic binary to something anyone could write an interface  
against...  

Using `which` would be a security problem, and allowing web servers to do  
this isn't usually allowed or considered good practice. I see someone else  
has been looking for the same thing and was warned about this with Galaxy  
http://www.biostars.org/p/14665/ and several problems are mentioned there.  

If you're going to call executables, I'd require them to be installed in a  
special directory local to the app, e.g. /app/bin/ which would make  
handling them easier, but this would all be mitigated by using specific  
Sinatra extensions.  

Just my 2 pence  

Iain  

On 7 February 2014 02:28, Ben Woodcroft <donttrustben at gmail.com> wrote:  

> Sounds cool yannick. Some thoughts:  
>  
> * the ptools rubygem has a 'which' method, which might be of use.  
> * the version tool would in itself be useful. Including it in a script to  
> add a --version flag, automatically working out the version from  
> rubygems/git commit/whether the repo is dirty/etc. would be pretty cool, as  
> it would ease documenting versions used.  
> * bio-commandeer seems of obvious use..  
>  
> Is there any good markup language for describing program inputs and  
> outputs? Anything that could be stolen from galaxy?  
>  
> ben  
>  
>  
> On 7 February 2014 04:47, Fields, Christopher J <cjfields at illinois.edu  
> >wrote:  
>  
> > You can always mentor and discuss implementations on this list to get  
> > others thoughts. That's the fun of it, both for the student and the  
> > community.  
> >  
> > chris  
> >  
> > On Feb 6, 2014, at 6:24 AM, Yannick Wurm <y.wurm at qmul.ac.uk> wrote:  
> >  
> > > Thanks, glad you like the idea.  
> > >  
> > > The thing is I'm not technical enough to supervise the  
> implementation...  
> > so I could only co-supervise with the help of a strong technical thinker.  
> > >  
> > > This could fit under sci-ruby's remit as well.  
> > >  
> > > Cheers,  
> > > Yannick  
> > >  
> > >  
> > >  
> > > On 6 Feb 2014, at 06:34, Pjotr Prins <pjotr.public14 at thebird.nl>  
> wrote:  
> > >  
> > >> This is a very good idea, and ties in with earlier bio-ngs work and  
> > >> our future plans in pipeline software management.  
> > >>  
> > >> GSoC also likes 'infrastructure' type projects - it was found out the  
> > >> last summit.  
> > >>  
> > >> Do add it to the OBF project proposal list. Also mention bio-ngs and  
> > >> your project.  
> > >>  
> > >> Pj.  
> > >>  
> > >> On Thu, Feb 06, 2014 at 12:09:58AM +0000, Yannick Wurm wrote:  
> > >>> Dear all,  
> > >>>  
> > >>> a small thought about a potential GSoC project.  
> > >>>  
> > >>> Many bioinformatics software consist in a binary that you run on the  
> > command line with one or few input files, some parameters and generates  
> > some output files. Let's consider only software that generates  
> potentially  
> > human-readable output.  
> > >>>  
> > >>> Most of us on this mailing list have no problem running that kind of  
> > software on the command-line. But for the majority of biologists that's  
> > still impossible: they need a point and click interface instead.  
> > >>>  
> > >>> So if you're the person who needs to implement that point and click  
> > interface, how do you do it?  
> > >>> 1. create a wrapper for galaxy [1]. This has become easy.. but puts  
> > the burden on your enduser to have or set up a galaxy installation (not  
> > trivial), and the galaxy user experience is debatable.  
> > >>> 2. use sinatra.rb (we did this for our sequenceserver wrapper for  
> > blast) - it worked but involved way too much manual labor.  
> > >>> 3. be old-skool (build your own from php/etc).  
> > >>>  
> > >>>  
> > >>> Clearly 1 isn't always appropriate & locks you into a weird  
> framework,  
> > and 2. is still to much work. Padrino & rails are overkill for the  
> simplest  
> > apps. With Ruby providing such great web development frameworks, why  
> isn't  
> > there an easier/faster way to generate a web wrapper around a piece of  
> > scientific software?  
> > >>>  
> > >>> Perhaps I'm missing something.  
> > >>>  
> > >>> Alternatively, creating a "wrapping scientific software" framework  
> > could be a viable GSoC project.  
> > >>>  
> > >>> Build it upon Sinatra, create a rigid framework where the basic  
> > locations of files that the developer needs to edit are predetermined  
> > (similarly to rails). Single page/webform for the user to enter data;  
> > single output/download page after the run was successful. No need to  
> store  
> > any user-data on the server. The framework should include the following  
> > features:  
> > >>> * easy way to verify presence, executability and version of binary  
> (or  
> > script) that is being wrapped  
> > >>> * easy way to specify number of input files, and potential  
> constraints  
> > on them [this stuff should be specified once; appropriate HTML should be  
> > auto-generated (bootstrap)].  
> > >>> * most basic constraints: size and/or extension  
> > >>> * more advanced constraints: user-extensible function that verifies  
> > the format  
> > >>> * easy way to specify possible parameters and constraints on their  
> > types  
> > >>> * easy way to show/include local data (HMM models, sequence databases  
> > etc...)  
> > >>> * easy way to make text-output look good  
> > >>> * eg. inserting specific headers or indexing at specific regexps  
> > (for table of contents)  
> > >>> * eg. csv output should be shown as a table  
> > >>>  
> > >>> I'm not the best qualified person to consider exact implementation  
> > details, but if someone wants to go ahead with it I'm happy to provide  
> more  
> > general thoughts.  
> > >>>  
> > >>> Cheers,  
> > >>>  
> > >>> Yannick  
> > >>>  
> > >>> [1]: http://galaxyproject.org  
> > >>>  
> > >>>  
> > >>> -------------------------------------------------------  
> > >>> Yannick Wurm - http://yannick.poulet.org  
> > >>> Ants, Genomes & Evolution ??? y.wurm at qmul.ac.uk ???  
> skype:yannickwurm  
> > ??? +44 207 882 3049  
> > >>> 5.03A Fogg ??? School of Biological & Chemical Sciences ??? Queen  
> > Mary, University of London ??? Mile End Road ??? E1 4NS London ??? UK  
> > >>>  
> > >>>  
> > >>> _______________________________________________  
> > >>> BioRuby Project - http://www.bioruby.org/  
> > >>> BioRuby mailing list  
> > >>> BioRuby at lists.open-bio.org  
> > >>> http://lists.open-bio.org/mailman/listinfo/bioruby  
> > >  
> > >  
> > > _______________________________________________  
> > > BioRuby Project - http://www.bioruby.org/  
> > > BioRuby mailing list  
> > > BioRuby at lists.open-bio.org  
> > > http://lists.open-bio.org/mailman/listinfo/bioruby  
> >  
> >  
> > _______________________________________________  
> > BioRuby Project - http://www.bioruby.org/  
> > BioRuby mailing list  
> > BioRuby at lists.open-bio.org  
> > http://lists.open-bio.org/mailman/listinfo/bioruby  
> >  
>  
>  
>  
> --  
> --  
> Ben Woodcroft  
> http://ecogenomic.org/users/ben-woodcroft <http://www.ecogenomic.org/>  
> _______________________________________________  
> BioRuby Project - http://www.bioruby.org/  
> BioRuby mailing list  
> BioRuby at lists.open-bio.org  
> http://lists.open-bio.org/mailman/listinfo/bioruby  
>  
_______________________________________________  
BioRuby Project - http://www.bioruby.org/  
BioRuby mailing list  
BioRuby at lists.open-bio.org  
http://lists.open-bio.org/mailman/listinfo/bioruby  




More information about the BioRuby mailing list