From jeremyp at sgx3.bmb.uga.edu Tue Jun 3 01:14:11 2003 From: jeremyp at sgx3.bmb.uga.edu (jeremyp@sgx3.bmb.uga.edu) Date: Tue Jun 3 00:14:03 2003 Subject: [Bioperl-pipeline] input creates/update action Message-ID: <34761.68.117.208.63.1054613651.squirrel@sgx3.bmb.uga.edu> Hello, I'm trying to write an input create or input creates to replace the update action. Unfortunately, I'm having trouble writing a working example. I wrote an xml file similar to the cdna2_genome configuration file. The problem is that the pipeline script loads the second data monger/input create into the job table before the pipeline runs at all. So, that job (the second data monger/input create job) runs before the analysis that should precede it (that is, the data monger analysis is analysis id 3, there is another normal analysis with analysis id 2). Exactly mimicking the update action would in fact require that analysis (id 3) to generate the input the data monger/input create job would use. Where am I going wrong? Thanks, Jeremy From shawnh at fugu-sg.org Tue Jun 3 13:35:01 2003 From: shawnh at fugu-sg.org (Shawn Hoon) Date: Tue Jun 3 00:34:42 2003 Subject: [Bioperl-pipeline] input creates/update action In-Reply-To: <34761.68.117.208.63.1054613651.squirrel@sgx3.bmb.uga.edu> Message-ID: Hi Jeremy, the 2nd datamonger job shouldn't get created. An input is only created if the tag is placed within the datamonger tag: like for cdna2genome.xml input_file whereas in analysis 3 setup_cdna2genome 1 no input So if u leave it out for your analysis id 3 then the input won't get created. hope that helps shawn On Tuesday, June 3, 2003, at 12:14 PM, jeremyp@sgx3.bmb.uga.edu wrote: > Hello, > > I'm trying to write an input create or input creates to replace the > update > action. Unfortunately, I'm having trouble writing a working example. I > wrote an xml file similar to the cdna2_genome configuration file. The > problem is that the pipeline script loads the second data monger/input > create into the job table before the pipeline runs at all. So, that job > (the second data monger/input create job) runs before the analysis that > should precede it (that is, the data monger analysis is analysis id 3, > there is another normal analysis with analysis id 2). Exactly mimicking > the update action would in fact require that analysis (id 3) to > generate > the input the data monger/input create job would use. Where am I going > wrong? > > Thanks, > Jeremy > _______________________________________________ > bioperl-pipeline mailing list > bioperl-pipeline@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-pipeline > From jeremyp at sgx3.bmb.uga.edu Tue Jun 3 16:00:13 2003 From: jeremyp at sgx3.bmb.uga.edu (jeremyp@sgx3.bmb.uga.edu) Date: Tue Jun 3 15:00:19 2003 Subject: [Bioperl-pipeline] input creates/update action In-Reply-To: References: <34761.68.117.208.63.1054613651.squirrel@sgx3.bmb.uga.edu> Message-ID: <35092.128.192.15.158.1054666813.squirrel@sgx3.bmb.uga.edu> > Hi Jeremy, > the 2nd datamonger job shouldn't get created. An input is only created > if the > tag is placed within the datamonger tag: > like for cdna2genome.xml > So if u leave it out for your analysis id 3 then the input won't get > created. > > > hope that helps > > shawn Excellent. Thanks. I had it setup that way at first... however, I was getting this exception: "MSG: Need an input name to create input". Unfortunately, I never explored where that was coming from. I thought it indicated that I needed to specify an input section in the xml file. Instead, it was because my InputCreate module was not written correctly, so I was calling the create_input function with an undefined first argument (in my InputCreate module)... Thanks again, Jeremy From juguang at tll.org.sg Wed Jun 4 11:34:16 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Tue Jun 3 22:34:00 2003 Subject: [Bioperl-pipeline] Fwd: Pipeline masters, help! Message-ID: <09312098-9635-11D7-8467-000A957702FE@tll.org.sg> Begin forwarded message: > From: Luo Ming > Date: Tue Jun 3, 2003 7:03:35 PM Asia/Singapore > To: bioinformatics@tll.org.sg > Subject: Pipeline masters, help! > > I was trying to run a simple blast pipeline, encountered a "Bus > error", what is that? > > perl ~luoming/src/bioperl-pipeline/scripts/PipelineManager -dbname > luoming_test_pipe -xml > /R5_capricorn/luoming/test_pipe/blast_file_pipeline.xml -schema > ~luoming/src/bioperl-pipeline/sql/schema.sql -local > > Creating luoming_test_pipe > Loading Schema... > Reading Data_setup xml : blast_file_pipeline.xml > Doing DBAdaptor and IOHandler setup > Doing Pipeline Flow Setup > Doing Analysis.. > Doing Rules > Doing Job Setup... > Loading of pipeline luoming_test_pipe completed > 3 analysis found. > Running test and setup.. > > //////////// Analysis Test //////////// > Checking Analysis 1 DataMonger > -------------------- WARNING --------------------- > MSG: Skipping test for DataMonger > --------------------------------------------------- > ok > Checking Analysis 2 Blast ok > Checking Analysis 3 Blast ok > Fetching Jobs... > Fetched 1 incomplete jobs > Running job /tmp//0/luoming_test_pipe_DataMonger.1054637610.690.out > /tmp//0/luoming_test_pipe_DataMonger.1054637610.690.err > Bus error > > From 237554 at mail.com Sun Jun 15 11:29:32 2003 From: 237554 at mail.com (237554@mail.com) Date: Sat Jun 14 22:19:24 2003 Subject: [Bioperl-pipeline] DVD Magick Pro Message-ID: 726327068@msn.com An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-pipeline/attachments/20030615/17905e71/attachment.htm From shawnh at fugu-sg.org Wed Jun 25 16:03:27 2003 From: shawnh at fugu-sg.org (Shawn Hoon) Date: Thu Jun 26 03:01:44 2003 Subject: [Bioperl-pipeline] Biopipe website Message-ID: Hi Folks, biopipe website has been down for a while as the IP address has changed for our server. We are currently awaiting the http://www.biopipe.org domain to point to the open-bio server. If you would like to access it for now, you may use either: http://biopipe.open-bio.org or http://biopipe-tmp.open-bio.org you may also add this line to /etc/hosts 155.94.54.81 biopipe.open-bio.org if it hasn't been propagated yet. let me know if there are any problems. cheers, Shawn From jeremyp at sgx3.bmb.uga.edu Thu Jun 26 15:17:31 2003 From: jeremyp at sgx3.bmb.uga.edu (jeremyp@sgx3.bmb.uga.edu) Date: Thu Jun 26 14:17:14 2003 Subject: [Bioperl-pipeline] web, restart Message-ID: <36707.128.192.15.158.1056651451.squirrel@sgx3.bmb.uga.edu> Hi, I remember there was some talk about including software to allow for running pipelines through a web interface. Has this been done? Also, is there a way to restart the PipelineManager? That is, if the PipelineManager were killed but I wanted the pipeline it was running when killed to continue running at a later date, is there a way to do this? Thanks, Jeremy From juguang at tll.org.sg Fri Jun 27 11:38:14 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Thu Jun 26 22:38:02 2003 Subject: [Bioperl-pipeline] web, restart In-Reply-To: <36707.128.192.15.158.1056651451.squirrel@sgx3.bmb.uga.edu> Message-ID: <66D7FD6E-A848-11D7-AB59-000A957702FE@tll.org.sg> On Friday, June 27, 2003, at 02:17 AM, jeremyp@sgx3.bmb.uga.edu wrote: > Hi, > > I remember there was some talk about including software to allow for > running pipelines through a web interface. Has this been done? > Yes, we are doing that right now. Do you have any idea to offer? :-) > Also, is there a way to restart the PipelineManager? That is, if the > PipelineManager were killed but I wanted the pipeline it was running > when > killed to continue running at a later date, is there a way to do this? > If you killed the PipelineManager process and the pipeline database still exist, you can simply re-run it with specifying the same database. If your configuration is wrong or something sort, so that PipelineManager quitted by itself, then you probably need to correct your configuration, e.g. the xml file, and flush the database, import the correct one and run it. Juguang > Thanks, > Jeremy > _______________________________________________ > bioperl-pipeline mailing list > bioperl-pipeline@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-pipeline > > ------------ATGCCGAGCTTNNNNCT-------------- Juguang Xiao Temasek Life Sciences Laboratory, National University of Singapore 1 Research Link, Singapore 117604 juguang at tll.org.sg From shawnh at fugu-sg.org Fri Jun 27 11:43:43 2003 From: shawnh at fugu-sg.org (Shawn Hoon) Date: Thu Jun 26 22:42:03 2003 Subject: [Bioperl-pipeline] web, restart In-Reply-To: <36707.128.192.15.158.1056651451.squirrel@sgx3.bmb.uga.edu> Message-ID: On Thursday, June 26, 2003, at 07:17 PM, jeremyp@sgx3.bmb.uga.edu wrote: > Hi, > > I remember there was some talk about including software to allow for > running pipelines through a web interface. Has this been done? > Uhm, I personally have not been working on this but I think some folks at TLL might be working on this. Aaron you wanna chime in? Should we start some discussion on what some issues that need to be addressed to get this developing seriously? > Also, is there a way to restart the PipelineManager? That is, if the > PipelineManager were killed but I wanted the pipeline it was running > when > killed to continue running at a later date, is there a way to do this? > Oh definitely, we do it all the time. When you kill the PipelineManager, all the job states are stored in the database. Normally what happens in this scenario is that the pipeline user 1) first kills the PipelineManager script 2) does a bkill 0 (for lsf) for his jobs 3) Do whatever fixing one needs and make sure that the inputs and output databases are cleaned appropriately 3) The state of the jobs in the job table will have a mix of jobs that have status Failed, New, Submitted You will need to set the jobs that are in Submitted state back to New or Failed. This is because we killed the jobs with the bkill before they could write their status back to the table. So the PipelineManager upon restart will think they are still running (not so clever as to check with LSF yet) and only fetch the New|Failed jobs. So execute : update job set status="NEW" where status="SUBMITTED" in your pipeline database 4) Remove the Pipeline lock file and run PipelineManager again OR run PipelineManager with the -f option and it should remove it for you. hope that is clear cheers, shawn > Thanks, > Jeremy > _______________________________________________ > bioperl-pipeline mailing list > bioperl-pipeline@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-pipeline > From aaron at tll.org.sg Fri Jun 27 16:47:23 2003 From: aaron at tll.org.sg (aaron@tll.org.sg) Date: Fri Jun 27 03:52:35 2003 Subject: [Bioperl-pipeline] web, restart In-Reply-To: References: Message-ID: <45596.203.116.185.151.1056700043.web@webmail.tll.org.sg> Dear Shawn, Hey, hope your network problems have cleared up, I can't detect anything wrong from this end (even ssh connections from home).. was it a particular machine or leo? As for the web pipeline... it certainly sounds like an interesting option. Was working on web-based monitoring software earlier that went thru three development revamps over 2 years. 1st version perl.cgi. 2nd pure java (app) thru proprietary tcp-based protocol, and we had to finally settle on jsp/tomcat to avoid firewall/port problems, etc. Our solution was based on a tiny server stub that maintains central control of the processes called a monitor. Then each web connection in will query this monitor for current process state, etc. Back then we used shared memory to decouple the actual processes from state monitoring. This can be done through Perl IPC V5 calls too, though our implementation was C. Let me read thru the biopipe architecture to see how the processes are handled (forked/threaded), managed by PID/lockfiles, and I'll get back to the list on how best to begin the design of the web monitor/launcher. Cheers, aaron > > On Thursday, June 26, 2003, at 07:17 PM, jeremyp@sgx3.bmb.uga.edu > wrote: > >> Hi, >> >> I remember there was some talk about including software to allow for >> running pipelines through a web interface. Has this been done? >> > > Uhm, I personally have not been working on this but I think some folks > at TLL might > be working on this. Aaron you wanna chime in? Should we start some > discussion on what some issues that need > to be addressed to get this developing seriously? > >> Also, is there a way to restart the PipelineManager? That is, if the >> PipelineManager were killed but I wanted the pipeline it was running >> when >> killed to continue running at a later date, is there a way to do this? >> > > Oh definitely, we do it all the time. When you kill the > PipelineManager, all the job states are > stored in the database. Normally what happens in this scenario is that > the pipeline user > > 1) first kills the PipelineManager script > 2) does a bkill 0 (for lsf) for his jobs > 3) Do whatever fixing one needs and make sure that the inputs and > output databases are cleaned appropriately > 3) The state of the jobs in the job table will have a mix of jobs that > have status Failed, New, Submitted > You will need to set the jobs that are in Submitted state back to > New or Failed. This is because we killed > the jobs with the bkill before they could write their status back > to the table. So the PipelineManager upon > restart will think they are still running (not so clever as to check > > with LSF yet) and only fetch the New|Failed jobs. > > So execute : update job set status="NEW" where status="SUBMITTED" in > your pipeline database > 4) Remove the Pipeline lock file and run PipelineManager again OR > run PipelineManager with the -f option and it should remove it for > you. > > hope that is clear > > cheers, > > shawn > > > > >> Thanks, >> Jeremy >> _______________________________________________ >> bioperl-pipeline mailing list >> bioperl-pipeline@bioperl.org >> http://bioperl.org/mailman/listinfo/bioperl-pipeline From juguang at tll.org.sg Fri Jun 27 17:07:56 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Fri Jun 27 04:07:38 2003 Subject: [Bioperl-pipeline] web, restart In-Reply-To: <45596.203.116.185.151.1056700043.web@webmail.tll.org.sg> Message-ID: <75C26288-A876-11D7-AB59-000A957702FE@tll.org.sg> On Friday, June 27, 2003, at 03:47 PM, wrote: > > As for the web pipeline... it certainly sounds like an interesting > option. > Was working on web-based monitoring software earlier that went thru > three > development revamps over 2 years. 1st version perl.cgi. 2nd pure java > (app) > thru proprietary tcp-based protocol, and we had to finally settle on > jsp/tomcat to avoid firewall/port problems, etc. Sorry, when did we have cgi version monitor? > Our solution was based on a tiny server stub that maintains central > control > of the processes called a monitor. Then each web connection in will > query > this monitor for current process state, etc. Back then we used shared > memory > to decouple the actual processes from state monitoring. This can be > done > through Perl IPC V5 calls too, though our implementation was C. > > Let me read thru the biopipe architecture to see how the processes are > handled (forked/threaded), managed by PID/lockfiles, and I'll get back > to > the list on how best to begin the design of the web monitor/launcher. Aaron, The monitor system we plan to build up is only at job counting level. Querying the running pipeline database and then counting out how many analysis are there, how many jobs of each analysis is done or failed etc and a sort. And I do not think we are going to investigate at process level. Correct me, if I was wrong, other guys. ------------ATGCCGAGCTTNNNNCT-------------- Juguang Xiao Temasek Life Sciences Laboratory, National University of Singapore 1 Research Link, Singapore 117604 juguang at tll.org.sg From shawnh at fugu-sg.org Fri Jun 27 17:36:55 2003 From: shawnh at fugu-sg.org (Shawn Hoon) Date: Fri Jun 27 04:35:16 2003 Subject: [Bioperl-pipeline] web, restart In-Reply-To: <45596.203.116.185.151.1056700043.web@webmail.tll.org.sg> Message-ID: <2E58C2BA-A8B5-11D7-AA7F-000A95783436@fugu-sg.org> On Friday, June 27, 2003, at 08:47 AM, wrote: > Dear Shawn, > > Hey, hope your network problems have cleared up, I can't detect > anything > wrong from this end (even ssh connections from home).. was it a > particular > machine or leo? > Ai yah, its Fugu network .. not problems when I go with IMCB network..sorry! > As for the web pipeline... it certainly sounds like an interesting > option. > Was working on web-based monitoring software earlier that went thru > three > development revamps over 2 years. 1st version perl.cgi. 2nd pure java > (app) > thru proprietary tcp-based protocol, and we had to finally settle on > jsp/tomcat to avoid firewall/port problems, etc. > Ah you certainly have the right experience for it. There are two areas that we want to have biopipe placed on the web, monitoring and management. They are pretty much decoupled in terms of the pipeline backend. They will probably be unified through the web interface. Elia had some good ideas for this and below are his abbreviated points: Monitoring-- -PipelineMonitor graphics (legacy work from Juguang) -web pages that will show a live monitor of a running pipeline -retrieve details of failing jobs etc By monitoring, I mean we do not need to look at the process calls like you mentioned below cuz we rely on the job scheduler LSF in this case. Thus we only need to provide the information in the pipeline job table like juguang said. We may want to be smart about the monitoring, and put in some heuristics to have the pipeline stop if all jobs that sent to the node come back failed very quickly which may signal that something is not setup properly that has escaped the initial setup checks. For the web end, Monitoring should not be too hard. A simple web based one may just be a cgi script that queries the pipeline database and refreshes every so often. Of course, usually we have the pipeline database sitting behind a firewall so we need some daemon... More fanciful would be an applet... Management -- -write client to run on the web server that can send small pipelines from the web to the pipeline server -manage multiple pipelines concurrently -start, suspend, resume, stop pipelines -Clean-up files and databases associated with finished pipelines This work was started by Kiran and Yujin but it never took off. Time to revisit. The idea is to have some interface web or java app that allows one to launch pipelines. We wanted to have a daemon sitting on the server that manages multiple pipelines. When a user submits a pipeline, the daemon will run the pipeline setup script and launch the pipeline and return a pipeline_id. In this way the user can query his jobs using this pipeline_id. This is quite doable once we are able to bridge the data flow from the web-end to putting them in the appropriate places and have scripts that automate setting up pipeline (which are mostly there already). > Let me read thru the biopipe architecture to see how the processes are > handled (forked/threaded), managed by PID/lockfiles, and I'll get back > to > the list on how best to begin the design of the web monitor/launcher. > Welcome aboard! Shawn > Cheers, > aaron > >> >> On Thursday, June 26, 2003, at 07:17 PM, jeremyp@sgx3.bmb.uga.edu >> wrote: >> >>> Hi, >>> >>> I remember there was some talk about including software to allow for >>> running pipelines through a web interface. Has this been done? >>> >> >> Uhm, I personally have not been working on this but I think some folks >> at TLL might >> be working on this. Aaron you wanna chime in? Should we start some >> discussion on what some issues that need >> to be addressed to get this developing seriously? >> >>> Also, is there a way to restart the PipelineManager? That is, if the >>> PipelineManager were killed but I wanted the pipeline it was running >>> when >>> killed to continue running at a later date, is there a way to do >>> this? >>> >> >> Oh definitely, we do it all the time. When you kill the >> PipelineManager, all the job states are >> stored in the database. Normally what happens in this scenario is that >> the pipeline user >> >> 1) first kills the PipelineManager script >> 2) does a bkill 0 (for lsf) for his jobs >> 3) Do whatever fixing one needs and make sure that the inputs and >> output databases are cleaned appropriately >> 3) The state of the jobs in the job table will have a mix of jobs that >> have status Failed, New, Submitted >> You will need to set the jobs that are in Submitted state back to >> New or Failed. This is because we killed >> the jobs with the bkill before they could write their status back >> to the table. So the PipelineManager upon >> restart will think they are still running (not so clever as to >> check >> >> with LSF yet) and only fetch the New|Failed jobs. >> >> So execute : update job set status="NEW" where status="SUBMITTED" in >> your pipeline database >> 4) Remove the Pipeline lock file and run PipelineManager again OR >> run PipelineManager with the -f option and it should remove it for >> you. >> >> hope that is clear >> >> cheers, >> >> shawn >> >> >> >> >>> Thanks, >>> Jeremy >>> _______________________________________________ >>> bioperl-pipeline mailing list >>> bioperl-pipeline@bioperl.org >>> http://bioperl.org/mailman/listinfo/bioperl-pipeline > > > > _______________________________________________ > bioperl-pipeline mailing list > bioperl-pipeline@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-pipeline > From elia at tll.org.sg Fri Jun 27 12:26:35 2003 From: elia at tll.org.sg (Elia Stupka) Date: Fri Jun 27 05:26:10 2003 Subject: [Bioperl-pipeline] web, restart In-Reply-To: <2E58C2BA-A8B5-11D7-AA7F-000A95783436@fugu-sg.org> Message-ID: <727E2046-A881-11D7-90A0-000A95770622@tll.org.sg> > By monitoring, I mean we do not need to look at the process calls like > you mentioned below cuz we rely on the job scheduler LSF in this case. > Thus we only need to provide the information in the pipeline job table > like juguang said. However, we want to overcome the limitations of a standalone system by implementing a thin client that can run on each node if we do want to look at what is happening on that particular node, or read the file,etc. > We may want to be smart about the monitoring, and put in some > heuristics to have the pipeline stop if all jobs that sent to the > node come back failed very quickly which may signal that something is > not setup properly that has escaped the initial setup checks. Absolutely. I'd like (if we get into this again) to make sure we go further than simple monitoring, and start putting smars into the software. > sitting behind a firewall so we need some daemon... More fanciful > would be an applet... We most definitely need a server/client structure, which will enable us to do tons of things in the future, so let's decided it upfront. > This work was started by Kiran and Yujin but it never took off. Yujin is coming to work with us from mid-July for a couple of months, he could help Elia --- Bioinformatics Program Manager Temasek Life Sciences Laboratory 1, Research Link Singapore 117604 Tel. +65 6874 4945 Fax. +65 6872 7007 From juguang at tll.org.sg Mon Jun 30 14:04:26 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Mon Jun 30 01:03:59 2003 Subject: [Bioperl-pipeline] Re: BioPipe In-Reply-To: <749E773C-A3E5-11D7-933C-000A95770622@tll.org.sg> Message-ID: <52805F78-AAB8-11D7-928A-000A957702FE@tll.org.sg> > >> Currently, one manager is in charge of one running pipe. if you want >> to run more, just start up more managers, why do not need one manager >> handlers many pipes? > > Because it is extremely counter-intuitive to have to run multiple > applications to run many pipelines. Just like the whole cluster is > managed with one application, so should pipelines. This is not only a > user interface issue. This is also a change to the schema, to handle > multiple pipelines in one database, rather than many databases, and > have pipeline ids on the jobs, on the files,etc. etc. > One thing I need to remind is about Perl's thread. As I gave up the Bio::DB::BioSQL::MultiDB using thread, thread-version perl raise more problems than its benefit. If you want to implement the above mentioned, you bet the Perl thread. If the thread problem cannot be solved in perl, you cannot make pipeline manager server program access multiple connection as well. My suggest is to let pipeline managers running as process and each has one database. On the top of it, there builds a server program to connect with client and simply submit the process on the cluster. Due to the perl thread problem, if it is true, we should use Java. We do not need a lot intercommunication between the Java marshall and the perl's pipeline managers, I think. my $.02 Juguang From jeremyp at sgx3.bmb.uga.edu Mon Jun 30 15:35:59 2003 From: jeremyp at sgx3.bmb.uga.edu (jeremyp@sgx3.bmb.uga.edu) Date: Mon Jun 30 14:35:33 2003 Subject: [Bioperl-pipeline] multiple pipelines Message-ID: <34801.128.192.15.158.1056998159.squirrel@sgx3.bmb.uga.edu> Hi, I just wanted to check on something I haven't really tried yet. We're planning on running multiple pipelines at the same time. So, for example, we might have 5 different databases and potentially all 5 could be in use at the same time. I was wondering if this might cause any problems... specifically, the pipelines share the same NFSTMP_DIR. It seems like there might be concurrency problems (specifically, the PipelineManager seems to use the same names for the executable scripts, 1.pbs, 2.pbs ...). Does this work out? One other note: with our setup, reading/writing from/to an nfs directory during a blast analysis is very io bound. I altered the Blast runnable to include a very simple system for doing the actual running in a specific directory (especially a specific disk/filesystem). So, when we run the blast file pipeline now, the input file is copied to a directory local to the node on which a given analysis is running, and the output is generated there as well, then copied back to the nfs mounted directory the analysis was started in. It does seem that having the database on an nfs mounted directory is ok. I don't know if anyone else has seen anything similar (our CPU usage was fairly low when running purely off of an nfs mounted disk). Jeremy