[Bioperl-pipeline] RE: bioperl-pipeline for the small lab

Mon, 5 Aug 2002 15:53:22 -0500

Hi Shawn,

Glad you got the list started.  I have a friend down at Wustl, who is interested in this as well.  It was his research that generated the process for us.  If you could add him to the list, or else give me the url to have him enroll, that would be great.(rfreimut@im.wustl.edu)

We will have to flesh this out re the cron job.  I see that aspect as taking a normal pipeline job and then just cron'ing it, which I think is what you are saying?  So a cron wrapper on an already existing pipeline path?

Regarding 2, I can help with trying to write first passes at any runnable that I might need (it sounds simple enough), but will wait for Martin's attempt to define the interface as you suggest.

In the meantime, I think me getting the infrastructure ready for the pipeline will be good, maybe testing something out?  I have mySQL loaded (should I move to postgres?  Chris Mungall seemed to say that it was better, haven't heard other endorsements one way or the other yet)  I have been updating the bioperl-pipeline code from anonymous cvs.  I don't have your slides (will you be posting them at the OBF or bioperl-site), but any other servers needed?  Apache, tomcat (guess not, no java?)  Can I go check cvs for an ERD? (Already have BioSQL loaded, but what of GFD?)  Maybe you can send me a copy of your presentation too, that would help me too.

BTW, I didn't get this except as direct from you, nothing from a bioperl-pipeline list.  Just FYI...

Thanks,
-Mat

-----Original Message-----
From: Shawn [mailto:shawnh@fugu-sg.org]
Sent: Monday, August 05, 2002 8:41 PM
To: Wiepert, Mathieu
Cc: Elia Stupka; kiran; bioperl-pipeline@bioperl.org
Subject: Re: bioperl-pipeline for the small lab

Hi Mat,
	great to hear from you. We are most interested to see how we can help
you out. We want to encourage as much use of the pipeline as possible
and that will give us tremendous support in terms of validating our
design and incorporating new features as new requirements arise.
>From your mail, I will try and summarize what you propose

1) The daemon pipeline which I can see as a cron job is essentially a
one stage pipeline. But we might have one or more runnables for the
logic of doing the diffs and annotating your database based on new hits
 etc. Yup, its interesting to me to develop this functionality which is
very generalizable.

2) The second pipeline is seems like a series of blast pipelines running
in parallel. Then using the hits to run framesearch and TFASTA. We can
write bioperl wrapper for those or you could help out too :) Wait for
the proposal of a new Bio::Tools::Run::Analysis interfaceI think that
Martin Senger is proposing for writing these wrappers which should be
quite nice and clean  . Outputs can be dumped to database or csv as you
wish :) In terms of GFD, it seems that we are mostly storing
hits/featurepairs for your searches which is now doable and we are
refining. You prolly also want to store your genes in the db as well.
For your sequences you may want to store them in BioSQL.

When I get back, we can come up with more concrete plans, I will try and
write up some configurations for your pipeline :)

its looks like an interesting start. FYI, we just created a
bioperl-pipeline mailing list and hope u don't mind I have cc'ed this 
mail to the list. 

>I can write some sort of docs for you on how to do this for the small
lab? 
:) we will definitely take u up on that

are u still at ISMB? lets talk some more if you want 
great mail.

cheers,
shawn