Install torque on a single node Centos 6.4
PBS is widely used as queuing system on small size to huge clusters. This is just a little post to resume somehow all installation steps necessary to get a single node torque server running on Centos 6.4 (64bit). Don't ask yourself about the use of this. You can definitely manage intelligently a heavy workload on a single machine, but here we want to do this mainly for having a development framework around torque / PBS for bigger clusters...so it'll just be a sort of testing environment. Torque is a fork of PBS and as such is very similar to the widely used PBS Pro.
Resolving dependencies :
I considered a self compiled installation of torque, so for this a few dependencies are necessary :
[root]# yum install libxml2-devel openssl-devel gcc gcc-c++ boost-devel
Configuring your firewall :
Open the following ports for tcp on your firewall : 15003, 15001 (you can use the graphical firewall setup tool available in CENTOS to do that or go through iptables).
Building torque :
First download the latest torque release from the adaptive computing website or via command line :
wget http://www.adaptivecomputing.com/downloading/?file=/torque/torque-4.2.9.tar.gz
tar -xzvf torque-4.2.9.tar.gz
cd torque-4.2.9
Next lets consider a default installation, where binaries and libraries will be installed to /usr/local.
./configure
[root]# make
[root]# make install
If not already done so add /usr/local/bin and /usr/local/sbin to your user and root PATH variables (add it to your .bashrc or .cshrc).
Next you need to install and start the torque authorization daemon and we can also copy all files to start torque as a server afterwards :
[root]# cp contrib/init.d/trqauthd /etc/init.d/
[root]# cp contrib/init.d/pbs_mom /etc/init.d/pbs_mom
[root]# cp contrib/init.d/pbs_server /etc/init.d/pbs_server
[root]# cp contrib/init.d/pbs_sched /etc/init.d/pbs_sched
[root]# chkconfig --add trqauthd
[root]# chkconfig --add pbs_mom
[root]# chkconfig --add pbs_server
[root]# chkconfig --add pbs_sched
[root]# echo '/usr/local/lib' > /etc/ld.so.conf.d/torque.conf
[root]# ldconfig
[root]# service trqauthd start
Configuring torque
Add the servername hosting the torque server to /var/spool/torque/server_name. Next set the library path to torque.conf :
[root]# echo '/usr/local/lib' > /etc/ld.so.conf.d/torque.conf
[root]# ldconfig
Initialize the serverdb by executing the following as root :
[root]# ./torque.setup root
Add the compute node (the server itself) to the nodes file. This can be done by adding the following into the /var/spool/torque/server_priv/nodes file :
MYMACHINENAME np=4
where MYMACHINENAME is the name of your node and np indicates the number of available CPU's for the queue. Adapt this to your system.
You also need to define the server by adding the following to the /var/spool/torque/mom_priv/config file :
$pbsserver MYMACHINENAME
$logeven 255
Here again MYMACHINENAME indicates the name of the server issuing jobs. As the node and the server is the same in our configuration, specify the same name as in the previous nodes file.
Finish the configuration with :
qterm -t quick
pbs_server
pbs_mom (normally only on the node)
Check if you can see your nodes by issuing the pbsnodes -a command.
Start the scheduler on the server using :
pbs_sched
As a user login at least once onto the server via ssh from the server itself to add the server to the known hosts file :
ssh username@MYMACHINENAME
Queue configuration
Create a new queue which we name test here :
qmgr -c "create queue test queue_type=execution"
qmgr -c "set queue test enabled=true"
qmgr -c "set queue test started=true"
qmgr -c "set server scheduling=True"
First test job submission
Create a sample job submission file called test.sh containing the following lines :
#!/bin/bash
#PBS -l walltime=00:1:00
#PBS -l nice=19
#PBS -q test
date
sleep 10
date
This should run during 10 seconds. Check if the job is inside the queue using qstat. Torque should produce also a test.sh.e# and test.sh.o# file as output.