[Bioperl-pipeline] [Question] Pipeline vs. PBS

Thu Apr 3 08:58:15 EST 2003

Hello.

Are you using OpenPBS or PBS-Pro?

In any case, I believe that this might be helpful:

"Bad UID for Job execution
If a user attempts to submit a job to PBS receives the following error
message Bad UID for execution, the user has not been authorized to run on
the server or execution host.

PBS does not assume a uniform UID space; that means that UserA on HostX
may not be the same user as UserA on HostY. Therefore if UserA at HostX
submits a job to be run on HostY as UserA, or anyother named user, then
PBS must be told that is ok. This authorization is performed by PBS by
calling the common C library call ruserok(). Thus on HostY, either HostX
must appear in the file /etc/hosts.equiv, or UserA at HostX must appear in
UserA's .rhosts file. "

Under Kiran's direction, we got BioPipe to work wonderfully with PBS-Pro.
I don't know if this will be useful for you, but I am attaching a bunch of
notes that I made on the setup process (see the very bottom of the
message).

On Wed, 2 Apr 2003, Sang Chul Choi wrote:

> Dear Stupka!
> 
> Firstly, thank you for your advices about PBS.
> 
> After a few weeks of rest from Biopipe, I have started to
> work with it on PBS. If this question is not good, I also think so, 
> just ignore it. Because the answer could be long and long 
> and you could be bored.
> 
> The Question is :
> I have got PBS source and installed it into my Linux,
> I started three daemons, pbs_server, pbs_sched, pbs_mom
> with creating queue.
> After configuring PipeConf.pm of Biopipe like queue name,
> I finally have run PipelineManager which originally work in local
> mode.
> That did not work as I expected. ( I would like to cry )
> 
> The error was that qsub of PBS command failed due to 
> security problem of PBS. I ran directly qsub like this:
> =================================
> qsub -q dque /tmp//8.pbs
> =================================
> The error output was
> =================================
> qsub: Bad UID for job execution
> =================================
> 
> Of course, this question seems not to be for Biopipe. 
> Would you let me know any solution? I am sorry for asking 
> a not-for-biopipe question.
> 
> In addition to that, is there any helpful document about PBS
> configuration related to Biopipe?
> 
> Aagin, again, I am sorry for bothering Biopipe Developers and 
> you with this tedious question.
> 
> Sincerely yours,
> 
> Sang Chul Choi
> 
> _______________________________________________
> bioperl-pipeline mailing list
> bioperl-pipeline at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-pipeline
> 

October 11, 2002.

BioPipeline Prerequisites:

[Not a prerequisite, but a suggestion - join the biopipeline mailing list
     http://www.bioperl.org/mailman/listinfo/bioperl-pipeline]    

0)  Directory Structure:

    Central Directory
        mkdir /home/BioPipe

        Executables
          mkdir /home/BioPipe/programs
    Temp space
         mkdir /home/BioPipe/tmp

1)  mySQL on the server
    -download to blue2:mysql/
              MySQL-3.23.51-1.i386.rpm
              MySQL-devel-3.23.51-1.i386.rpm
              MySQL-client-3.23.51-1.i386.rpm
              MySQL-shared-3.23.51-1.i386.rpm
    -rpm -i *

2)  DBI on the server and all of the nodes

  a)Server
    Copy 
         Data-ShowTable-3.3.tar.gz
         DBI-1.18.tar.gz, and
         Msql-Mysql-modules-1.2216.tar.gz
    to blue:mysql/DBI_DBD

    Follow DBD  INSTALLATION directions under
            Msql-Mysql-modules-1.2216

(The dbd install kept complaining about '/usr/bin/ld: cannot find -lz'.
     Found a post in google groups that suggested:
           sudo ln -s /usr/lib/libz.so.1 /usr/lib/libz.so
     That solved it.)

  b) Nodes
     Had trouble installing DBI on the nodes.  Did perl -MCPAN and 
     also did work.  Had to force install:
         perl -MCPAN -e shell
         force install Bundle::DBD::mysql

3) Connecting from the nodes to MYSQL on the server
   Need to give the nodes permission to connect to mysql on blue2.
   For this, change the host column of the user table:
    mysql -u root mysql
    update user set Host='%' where Host='blue2.cshl.org' and User='root';
    exit;
    mysqladmin flush-privileges -u root

3)  XML modules:
        Parser (already installed)
        SimpleObject
                Tried with CPAN and manually, and both times got errors
                It complained ferociously and tried to install LibXML
version
                of it.  Ended up just copying SimpleObject.pm to the 
                /usr/lib/perl5/vendor_perl/5.6.1/XML/ directory.

4) BioPerl

 a) bioperl-live, bioperl-run, and bioperl-pipeline
 cvs -d :pserver:cvs at cvs.bioperl.org:/home/repository/bioperl login
 when prompted, the password is 'cvs'
 cvs -d :pserver:cvs at cvs.bioperl.org:/home/repository/bioperl checkout
bioperl-live
   cvs -d :pserver:cvs at cvs.bioperl.org:/home/repository/bioperl checkout
bioperl-run
   cvs -d :pserver:cvs at cvs.bioperl.org:/home/repository/bioperl checkout
bioperl-pipeline

5) Environment variables
#In .cshrc add:
setenv PIPEHOME /u/blue2/BioPipe
setenv PERL5LIB
/u/blue2/BioPipe/bioperl-live:/u/blue2/kiran/src/bioperl-pipeline
setenv PATH /usr/pbs/bin/:${PATH}
------------------

Pipiline Installation.

1) Move bioperl-live to the user's home directory

   mv bioperl-live /home/lenny/Source/BioPipe_local/bioperl-pipeline

2) Edit the configuration bioperl-pipeline/Bio/Pipeline/PipeConf.pm

   set:

     NFSTMP_DIR => '/u/blue2/kiran/tmp/',

    DBI_DRIVER => 'mysql',
    DBHOST     => 'blue2.cshl.org',
    DBNAME     => 'test_XML',
    DBUSER     => 'root',
    DBPASS     => '',        

    BATCH_MOD   =>  'PBS',
   QUEUE      => 'workq',      

    RUNNER     =>
'/u/blue2/kiran/src/bioperl-pipeline/Bio/Pipeline/runner.pl',

3) Run a test script to crate database and schema, populate the pipeline
tables with using a sample
xml template.

     perl Xml2Db.pl -schema ../t/data/schema.sql -p
templates/simple_blast_setup.xml -dbhost
localhost

4) Define the XML for the pipeline.

   The three components of the XML are:
       database_setup
       iohandler_setup
       pipeline_flow_setup

   DATABASE_SETUP defines where pipeline input and output objects will
reside and how to communicate
with the database to fetch/write the objects.
   IOHANDLER_SETUP identifies what the actual read and write methods for
the above database are,
with respect to each kind of input/object.
   PIPELINE_FLOW_SETUP is the heart of the pipeline.  This is where the
logic of all the jobs,
inputs, outputs, and job dependencies are specified.

   Data Flow:

   1)Split Rice Contigs
   2)blat
   3)sort blat results
   4)pslReps
   5)filter

5) run XML2DB.pl

6) run PipelineManager.pl

//

October 17

QUICK NOTES

PBS:

-have to set 'scheduler_iteration' in server variables (qmgr) to 10
seconds.
Otherwise, can have 10-minute coffee breaks in the cluster.
-have to make sure that PBS can scp from each of the nodes.  A common
problem is that there is the
'first time host' question for ssh, and PBS can't write the log files.
-turn of verbose logging
      a)For the server:      
            qmgr
            set server log_events=7

      b) For the scheduler:

            vi /var/spool/PBS/sched_priv/sched_config
            log_filter: 1

-The path to pbs should be in the environment variables, or the 'qsub'
will
     be very confused.

-Everything should be from the NFS perspective. 
     i.e. Check to make sure that you can see each directory and file that
you define in the xml
from the slave nodes.  This includes executable programs, input/output
directories and files, and
the environment variables.

     Must be NFS aware:
          1) PERL5LIB in .cshrc
          2) executables (repeatmasker, blast, etc) in the xml
          3) input/out files and directories in the xml.
          4) NFS_TMPDIR and RUNNER in PipeConf.pm

-MYSQL 

        Set the MYSQL timeout (wait_timeout variable). Default is 28800
seconds or 8 hours.  Make it
more reasonable (dependinig on how long your pipeline might run), and
start with:

        There is a maximum for the number of allowed connections to mysql.
The default is 100, and
if your cluster has more than that nodes, you would be in trouble.  At the
same time, don't want to
set it to infinity as that can hide runaway connection leaks.  

     safe_mysql  --set-variable wait_timeout=267820 
                 --set-variable max_connections=1500

//