[Bioperl-pipeline] creating jobs, job_setup

Alexandre Dehne dehneg at labri.fr
Wed Feb 18 09:36:14 EST 2004


Hi,

Please excuse me for not being clear enough in my last mail. So I am
going to *try* to be more explicite. I am introducing now on three new
pipelines (A, B and C) derived from the one I mentionned in my last
email to explain my problems (you can forget the pipeline in my last
email, I am introducing  the new ones entirely ; the xml files are
attached).
The pipelines A and B allow me to show you the problem that came up
using Kiran's solution. The pipelines A and C help me to explain the way
the rules were not followed.

Let's start with the descriptions of the pipelines A, B and C.
First, the common structure of A, B and C :
run analysis : 1->2->3->4->5
analysis 1 : datamonger
analysis 2 and 3 : analysis that take nothing on input and ouput nothing
(using CAAT-Box program in my case).
analysis 4 : datamonger
analysis 5 : analysis that need an input (from analysis 4), using blast
for example.


Then the differences between the analysis from A, B and C :

# analysis 1 from A and C (my old solution) : a datamonger that uses an
InputCreate module named setup_nothing which creates a "void" input like
the following :
  my @input=$self->create_input("nothing",'',"infile");
  my $job   = $self->create_job($next_anal,\@input);
  $self->dbadaptor->get_JobAdaptor->store($job);

# analysis 1 from B (Kiran's solution) : a datamonger that uses an
InputCreate module named setup_nothing_kiran which creates a void input
like the following :
   my $job   = $self->create_job($next_anal);
   $self->dbadaptor->get_JobAdaptor->store($job);

# analysis 2 and 3 (from A, B and C) : nothing special exept that the
<action> tag has to be  "COPY_ID_FILE" in A and B, but can be
indeferently "COPY_ID_FILE"  or "COPY_ID" in C.

# analysis 4 from A and B : a datamonger using an InputCreate module
which creates jobs with an input (for example the module setup_initial).
Please note that, contrary to the analysis 4 from A, the <input> tag is
outside the <data_monger> tag:
...
   <analysis id="4">
      <input>
         <name>$input_description</name>
         <iohandler>1</iohandler>
      </input>
      <data_monger>
        <input_create>
          <module>setup_initial</module>
          <rank>1</rank>
 ....

# analysis 4 from C : a datamonger using an InputCreate module which
creates jobs with an input (for example the module setup_initial).
Please note that, contrary to the analysis 4 from A and B, the <input>
tag is inside the <data_monger> tag :
...
   <analysis id="4">
      <data_monger>
        <input>
          <name>$input_description</name>
          <iohandler>1</iohandler>
        </input>
        <input_create>
          <module>setup_initial</module>
          <rank>1</rank>
 ....




# analysis 5 (from A, B and C): nothing special, the same blast analysis
as in the example blast_db_file.xml.





Okay, now you know all about the pipelines A, B and C.
The pipeline A  is the one I precedently used to create void inputs (via
the module setup_nothing) and the pipeline B is the Kiran's way (via the
module setup_nothing). I really like Kiran's solution but without any
input the analysis 4 returns the following error to me:
======== "
  READING: Lost the will to live Error. Problems with runnableDB
fetching input
   [
   ------------- EXCEPTION: Bio::Root::Exception -------------
   MSG: Runnable Bio::Pipeline::Runnable::DataMonger=HASH(0x89f1b98)
cannot call
   STACK: Error::throw
   STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.0/Bio/Root/Root.pm:342
   STACK: Bio::Pipeline::RunnableDB::setup_runnable_inputs
/var/opt/Genolevures/src/biopipe-bundle-0.1/bioperl-pipeline/Bio/Pipeline/RunnableDB.pm:244
   STACK: Bio::Pipeline::RunnableDB::fetch_input
/var/opt/Genolevures/src/biopipe-bundle-0.1/bioperl-pipeline/Bio/Pipeline/RunnableDB.pm:485
   ......
" ============

My questions concerning Kiran's solution are :
- How can I manage this problem (using the data monger after an analysis
which outputs nothing ) ?
- Is there another <action> tag (different from "COPY_ID" or
"COPY_ID_FILE") which does not use the concept of copying an input
(which does not exist in my case) ?




That was the questions using pipeline A and B. Now, I focus on pipelines
A and C to show how the rules are not followed. Indeed, by running
pipeline C, the analysis 4 starts just after the analysis 1 which
contradicts the rules. Do you have an explanation ? Anyway, I
overturned this problem by using the pipeline A which consists in
writing the <input> tag outside the <data_monger> tag for the analysis 4.



I hope I am clear enough.
Thanks in advance.


Alexandre






On Fri, 2004-02-13 at 09:18, Shawn Hoon wrote:
> Hi Alexandre,
> 	okay this is what I gather you are trying to do:
> 
> Run analysis 1 -> 2 -> 3 -> 4
> 
> The question is what are your inputs? are u running the four analysis 
> on the same input type? for example, you
> have four blast analysis that you do on  sequences? If so, then what 
> you would do is use a input create/data monger to
> create inputs for analysis 1. Then in your rules you would specify 
> COPY_ID for analysis 1 -> 2 and 2->3 and 3->4
> then the input id will be transferred between analysis.
> 
> If your input  for analysis 2 for example is different from that of 
> analysis 1, then you need to do something different.
> For this, there are 2 options:
> 
> 	1) If you require that the analysis 1 is completed before 2 is 
> completed, then you need an analysis in between 1 and 2 ( so as a 
> result 2 becomes 3)
> 	     Analysis 2 would now be an input_create which knows how to create 
> inputs for analysis 2. (Basically we are assuming the this input 
> creation is linked to
>    	     the input 1 of analysis 1.
> 	2) If you require that all of the inputs from analysis 1 is completed 
> before any analysis 2 jobs are started, you can do a rule WAITFORALL 
> which would then launch
>      	     a job of analysis 2 (which may or may not be a input create).
> 
> for your definition below, I don't see why analysis 4 should be 
> executed at startup. Can you provide the xml file?
> shawn
> 
> 
> On Feb 12, 2004, at 5:48 AM, Alexandre Dehne wrote:
> 
> > Hi Kiran,
> >
> > Thank you for answering me.
> > Actually, your solution is very clean but, by using it, other problems
> > came up.
> >
> > Here is the current situation:
> > So, I start a job on my first analysis with your suggestion. Then, more
> > jobs on other analysis are created by placing
> > "<action>COPY_ID_FILE</action>" or "<action>COPY_ID</action>" in their
> > respective rules in the XML file.
> > (Remember that for now, all of my analysis do not take any input and do
> > not give any output. So, this way, everything is fine and work well.)
> >
> > Here comes the problem when I want to use an analysis that needs an
> > input. For that, I am using the data monger. Since the data monger 
> > needs
> > an input, it therefore does not work. So, I am trying to create this
> > input by using the following <input> mark:
> >
> > ...
> >    <analysis id="4">
> >       <data_monger>
> >         <input>
> >           <name>$input_description</name>
> >           <iohandler>1</iohandler>
> >         </input>
> >         <input_create>
> > ....
> >
> >
> > My initial data monger (analysis N.1) and the one previously described
> > (analysis N.4) are now called at the beginning of the pipeline.
> > But, the analysis N.4 has to be called after the third one as I
> > specified it in the rules.
> >
> > Do you have any suggestion on how to solve my problem and why the rules
> > are not followed ?
> > Please let me know if I am not clear.
> >
> > Thank you in advance,
> >
> > Alexandre
> >
> >
> >
> > On Wed, 2004-02-11 at 00:30, Kiran Kumar wrote:
> >> Hi Alexandre,
> >> It's nice to know that it fits into your work.
> >>
> >> In short, you would be able to create job without inputs. The direct 
> >> way
> >> would be 'not to pass' any inputs to "create_job" function.
> >>
> >>       my $job   = $self->create_job($next_anal);
> >>       $self->dbadaptor->get_JobAdaptor->store($job);
> >> That should make it 'righteous' :-)..
> >>
> >> Since you are following the Biopipe spirit, let me go on to explain 
> >> the
> >> other aspects too.
> >>
> >>
> >> On the xml level, you are right that the <job_setup> tag could be 
> >> used for
> >> this purpose.
> >> The <job_setup> provides for specifying jobs directly inside the XML 
> >> file
> >> without using a Datamonger/InputCreate. Ofcourse, this is convinient 
> >> if
> >> the number of jobs are handful which otherwise would make the XML file
> >> very lengthy. This feature is still there but has not been tested for 
> >> long
> >> time. We have stopped using this feature for a drawback it poses 
> >> towards
> >> the biopipe spirit which is as follows.
> >>
> >> If the job needs inputs, and it is specified using job_setup options,
> >> the xml file becomes too specific and anyone else trying to re-use it
> >> would have to change all the input_ids each time they need to run for
> >> different sets of inputs. The datamonger/InputCreate on the other 
> >> hand,
> >> provides for the clean separation of input names from the xml pipeline
> >> specification. The InputCreates are expected to read the input_names 
> >> for
> >> the the jobs they are gonna create from a file or directory or 
> >> somewhere
> >> (this location for the input_names is specified as the input_create's
> >> parameters in the xml file).
> >>
> >> Hope I havent left you more confused than before!
> >>
> >> Cheers,
> >> Kiran
> >>
> >>
> >>> Hi,
> >>>
> >>> First, I would like to congratulate the Biopipe team for having 
> >>> created such a useful tool.
> >>>
> >>>
> >>> The context :
> >>> For several reasons (some goods and some not so good), some of my 
> >>> runnables take nothing in input and return nothing.
> >>>
> >>> The problem :
> >>> This type of runnables does not match the biopipe "spirit", so it is 
> >>> a problem to create jobs for these runnables via the "create_job" 
> >>> function which needs a array input.
> >>>
> >>> The "temporary" unrighteous solution :
> >>> I have created an InputCreate module named setup_nothing which 
> >>> creates a void input like the following :
> >>>    my @input=$self->create_input("nothing",'',"infile");
> >>>    my $job   = $self->create_job($next_anal,\@input);
> >>>    $self->dbadaptor->get_JobAdaptor->store($job);
> >>> This way, I launch one job on my analysis as well as on the 
> >>> following ones by placing "<action>COPY_ID_FILE</action>" in their 
> >>> respective rules in the XML file.
> >>>
> >>>
> >>> The questions :
> >>> Is there a clean way to create jobs without any input (a just_do_it 
> >>> function ?) ?
> >>> Perhaps the <job_setup> mark in the XML file ?
> >>> Also, could someone tell me more about this <job_setup> mark ???
> >>>
> >>>
> >>> Thank you in advance
> >>>
> >>>
> >>> Alexandre
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> bioperl-pipeline mailing list
> >>> bioperl-pipeline at bioperl.org
> >>> http://bioperl.org/mailman/listinfo/bioperl-pipeline
> >>>
> >>
> >
> > _______________________________________________
> > bioperl-pipeline mailing list
> > bioperl-pipeline at bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-pipeline
> >
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: A.xml
Type: text/xml
Size: 5858 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-pipeline/attachments/20040218/ebda9df9/A-0001.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: B.xml
Type: text/xml
Size: 5936 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-pipeline/attachments/20040218/ebda9df9/B-0001.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: C.xml
Type: text/xml
Size: 5881 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-pipeline/attachments/20040218/ebda9df9/C-0001.xml


More information about the bioperl-pipeline mailing list