Job submission

Jobs are submitted to the cluster with qsub. The most basic usage of qsub has the job script as its only argument. To submit a job,

> qsub script.sh

Submitted jobs are identified by a job id (a unique id assigned by the cluster) and job name (defaults to the name of the script). The job id cannot be changed by the user. User's can set the job name to help distinguish multiple instances of the same script.

> qsub -N run_script.1 script.sh
> qsub -N run_script.2 script.sh

Scripts can also accept arguments from the command line during submission. They can be accessed within the script as $1,$2 … $n (n - number of arguments). To submit a job with arguments

> qsub -N run_script_all  script.sh run1 run2 run3

To parallelize processing of the above script,

> qsub -N run_script.1 script.sh run1
> qsub -N run_script.2 script.sh run2 
> qsub -N run_script.3 script.sh run3

Job restrictions

== Each node has a finite amount of memory installed and due to the disk-less nature of the nodes there are restrictions set on the amount of ram used. Currently, the default is to assign 10G of ram per job that is submitted. If your job requires more than 10GB, then you may request a higher limit with the “-l h_vmem” directive … otherwise you don't have to do anything. This is done to prevent memory over subscription and to better distribute the load across the available machines.

> qsub -N run_script.1 -l h_vmem=12G,vf=12G script.sh run1

The above example will request/reserve 12G of available memory. “vf” will ensure your job will not go to a node unless it has the required amount available. Also, if you exceed the requested amount of “h_vmem” the grid engine will terminate the job and you will receive notice. In most cases you will not have to do anything, since 10G is a significant amount. The amount of ram used in your jobs is listed as “Max vmem” in the emails set from the cluster. The restriction is put in place to prevent memory being over allocated and jobs crashing an entire node, which would therefore kill other users' jobs.

You can also get it from previous jobs if you have the job number with qacct ( there will be a resulting entry for “maxvmem” ) :

> qacct -j JOBNUM
maxvmem   4.315G

Also, you can request the info from currently running jobs with qstat ( look for the “usage” information ) :

> qstat -j JOBNUM
usage    1:                 cpu=3:07:56:02, mem=36938.10261 GBs, io=15.57702, vmem=769.992M, maxvmem=1.451G

The maximum available is ~750GB on any node, so if you request more than that, the job will just sit in the queue waiting indefinitely.

Please do not request additional resources unless you absolutely need them. If additional resources are requested, they are deducted from the amount available to everyone else. If unneeded resources are requested, this reduces the capacity on a given node for other potential usage.

There is a global limit on any single user of 60 slots and/or 1920G of ram.

There is a 6GB cumulative quota on all HOME directories

Job status

The current statu s of a job can be checked with qstat. This will return the current list of jobs owned by the user.

> qstat 
job-ID  prior    name       user      state  submit/start at      queue                        slots ja-task-ID
--------------------------------------------------------------------------------------------------------------- 
   918  0.00000  script.sh  deshmukh  r      10/09/2007 08:20:24  users.q@node6.biac.duke.edu      1
   919  0.00000  script.sh  deshmukh  qw     10/09/2007 08:20:26                                   1

Each job listing has the following relevant properties

job-id	Unique id assigned by the cluster.
name	Name of the job. Default value is the name of the script submitted.
user	Username of person who submitted the job.
state	Current state of the job. This could be “r” → running or “qw” → waiting in queue.
submit/start at	Submission time in “qw” state and start time in “r” state.
queue	Queue and node on which the job is being run. This field is empty in “qw” state.
slots	Number of processors the job will use.

When the job is completed, it will no longer appear in qstat listings.

The status of all jobs owned by users can be checked with qstatall

> qstatall
Running jobs:
job-ID  # name         owner    start time          running in
-----------------------------------------------------------------------------
  1294  1 script1.sh   deshmukh 10/09/2007 12:24:01 users.q
  1295  1 script2.sh   bizzell  10/09/2007 12:24:16 users.q

Job delete

A submitted job can be deleted with qdel. It takes the job-id (listed by qstat) as its argument.

> qdel 9999

All jobs for a particular user can be deleted with the following command.

> qdel -U username

Template Script

Jobs are usually written in bash. They are similar to local bash scripts in syntax and usage. In addition, they contain cluster related directives identified by lines starting with “ #$ ”. These are used to send job related setup information to the cluster. Scripts also contain requests for access to experiment data. The BIAC template script is a good starting point for testing job submission and as a base script for all jobs. Begin, by making a copy of the template script below.

The template script requests access to an experiment folder and lists its contents. It needs a valid BIAC Experiment Name (case-sensitive) that is accessible by the user. Submit myscript.sh using qsub .

qsub -v EXPERIMENT=Dummy.01 myscript.sh

Run qstat to check job status. The job will initially be in “qw” state. Wait for a few seconds and run qstat again. The job should be in “r” state. If you don't see a listing, then the job has completed. The results of the job should appear in the experiment folder under Analysis (eg: \\Server\BIAC\Dummy.01\Analysis ) as myscript.sh.xxx.out (xxx is the job id). If you don't see the file, check the experiment name that was provided at submission.

The script is divided into multiple sections. The user sections are USER DIRECTIVE and USER SCRIPT.The remaining sections are setup related and don't require modifications for most scripts. They are critical for access to your data.

USER DIRECTIVE

If you want mail notifications when your job is completed or fails you need to set the correct email address. Change the dummy email address (user@somewhere.edu) with the correct email address in the following line.

#$ -M user@somewhere.edu

USER SCRIPT

Add your script in this section.
Within this section you can access the requested experiment folder using <color navy>$EXPERIMENT</color>. All paths are relative to this variable eg: <color navy>$EXPERIMENT</color>/Data <color navy>$EXPERIMENT</color>/Analysis. The $EXPERIMENT variable is a temporary directory path (assigned for a specific job) that points to the requested experiment directory. Do not use this in place of the actual experiment name (eg: Dummy.01) if its required within your script.

  # Correct - lists the contents of the experiment folder 
  ls -l $EXPERIMENT 
 
  # Correct - lists the contents of the Analysis folder in your experiment directory      
  ls -l $EXPERIMENT/Analysis 
 
  # Incorrect - The output will be  " My experiment name is /path/to/experiment "
  # instead of the desired " My experiment name is Dummy.01 "    
  echo "My experiment name is $EXPERIMENT"

All terminal output is routed to the “ Analysis ” folder under the Experiment directory i.e. <color navy>$EXPERIMENT</color>/Analysis. To change this path, set the <color navy>OUTDIR</color> variable at the beginning of this section to another location under your experiment folder.

  OUTDIR=$EXPERIMENT/Analysis/ClusterLogs

On successful completion the job will return 0. If you need to set another return code, set the <color navy>RETURNCODE</color> variable in this section. To avoid conflict with system return codes, set a <color navy>RETURNCODE</color> higher than 100.

  RETURNCODE=110

Arguments to the USER SCRIPT are accessible in the usual fashion eg: <color navy>$1 $2 $3</color>.

basic.sh

#!/bin/sh
 
# --- BEGIN GLOBAL DIRECTIVE --
#$ -S /bin/sh
#$ -o $HOME/$JOB_NAME.$JOB_ID.out
#$ -e $HOME/$JOB_NAME.$JOB_ID.out
#$ -m ea
# -- END GLOBAL DIRECTIVE --
 
# -- BEGIN PRE-USER --
#Name of experiment whose data you want to access
EXPERIMENT=${EXPERIMENT:?"Experiment not provided"}
 
EXPERIMENT=`findexp $EXPERIMENT`
EXPERIMENT=${EXPERIMENT:?"Returned NULL Experiment"}
 
if [ $EXPERIMENT = "ERROR" ]
then
        exit 32
else
#Timestamp
echo "----JOB [$JOB_NAME.$JOB_ID] START [`date`] on HOST [$HOSTNAME]----"
# -- END PRE-USER --
# **********************************************************
 
# -- BEGIN USER DIRECTIVE --
# Send notifications to the following address
#$ -M user@school.edu
 
# -- END USER DIRECTIVE --
 
# -- BEGIN USER SCRIPT --
# User script goes here
 
# List all files in the requested Experiment directory
ls -l $EXPERIMENT
 
 
 
# -- END USER SCRIPT -- #
 
# **********************************************************
# -- BEGIN POST-USER --
echo "----JOB [$JOB_NAME.$JOB_ID] STOP [`date`]----"
OUTDIR=${OUTDIR:-$EXPERIMENT/Analysis}
mv $HOME/$JOB_NAME.$JOB_ID.out $OUTDIR/$JOB_NAME.$JOB_ID.out
RETURNCODE=${RETURNCODE:-0}
exit $RETURNCODE
fi
# -- END POST USER--

Notes

if you ever edit your scripts on a non-unix machine, please run dos2unix on them before submitting
sometimes there are hidden window's characters that will prevent the script from running

Brain Imaging & Analysis Center

Table of Contents