Utilizing the Array Features of the HPC Cluster
When a job is submitted to the HPC cluster with the -t option, the SGE (Sun Grid Engine) batch processor is notified that an array submission is being performed and sets up some enviromental variables which can be used to break up jobs into smaller jobs on the fly. The full usage of the -t option is:
-t [start]-[end]:[step]If step is excluded, a value of 1 is assumed.
Any application or scripting language which accepts command line arguments, or can read enviroment variables, can take advantage of this Array feature when running on the cluster. The example below shows you how to use this in R.
Source of the submission script:
#!/bin/sh
# Give the job a name
#$ -N R_Array
# set working directory on all host to
# directory where the job was started
#$ -cwd
# send output to job.log (STDOUT + STDERR)
#$ -o job.log
#$ -j y
# Setup array variables (range and step of chunks)
#$ -t 1-100:50
# specify the hardware platform to run the job on.
# options are: cos, amd64-low, em64t
#$ -q amd64-low
mystart=${SGE_TASK_ID}
myend=$((${SGE_TASK_ID} + ${SGE_TASK_STEPSIZE} - 1))
# command to run. ONLY CHANGE THE NAME OF YOUR APPLICATION
/usr/local/apps64/R/bin/R --slave < test.r > output.${SGE_TASK_ID} --args $mystart $myend
Source of the test.r application:
# Get the command line arguments
args <- commandArgs()
# Assign the ranges to mybegin and myend
mybegin <- as.integer(args[4])
myend <- as.integer(args[5])
cat("number(x) Sqr(x)\n")
cat("------------------ ----------\n")
for (i in mybegin:myend) {
cat(i," ",i*i,"\n")
}
q()
One simple command :
qsub submit.shwill submit the test.r application to the cluster, and the cluster will use the arguments from the -t option to submit each section to a separate node. Our example above runs a loop from 1 to 100, with a step size of 50, so two nodes will be used (2 separate jobs). One node will process the range from 1-50, while the 2nd node processes from 51-100.
How to use R Libraries which aren't installed on the cluster
