Differences

This shows you the differences between two versions of the page.

--- biac:cluster:submit [2011/09/19 14:57]
petty [Job submission]
+++ biac:cluster:submit [2023/02/23 18:43]
@@ Line 1: / Line 1: @@
-====== Job submission ======
-Jobs are submitted to the cluster with **qsub**. The most basic usage of qsub has the job script as its only argument. To submit a job,
-  > qsub script.sh
-Submitted jobs are identified by a job id (a unique id assigned by the cluster) and job name (defaults to the name of the script). The job id cannot be changed by the user. User's can set the job name to help distinguish multiple instances of the same script.
-  > qsub -N run_script.1 script.sh
-  > qsub -N run_script.2 script.sh
-Scripts can also accept arguments from the command line during submission. They can be accessed within the script as $1,$2 ... $n (n - number of arguments). To submit a job with arguments
-  > qsub -N run_script_all  script.sh run1 run2 run3
-To parallelize processing of the above script,
-  > qsub -N run_script.1 script.sh run1
-  > qsub -N run_script.2 script.sh run2
-  > qsub -N run_script.3 script.sh run3
-====== Job restrictions ======
-Each node has a finite amount of memory installed and due to the disk-less nature of the nodes there are restrictions set on the amount of ram used.  Currently, the default is to assign 4G of ram per job that is submitted.  If your job requires more than 4GB, then you may request a higher limit with the **"-l h_vmem"** directive ... otherwise you don't have to do anything.  This is done to prevent memory over subscription and to better distribute the load across the available machines.
-  > qsub -N run_script.1 -l h_vmem=5G script.sh run1
-The above example will request/reserve 5G of available memory.  Your job will not go to a node unless it has the required amount available.  Also, if you exceed the requested amount the grid engine will terminate the job.  In most cases you will not have to do anything, since 4G is a significant amount.  The amount of ram used in your jobs is listed as **"Max vmem"** in the emails set from the cluster.
-You can also get it from previous jobs if you have the job number with qacct ( there will be a resulting entry for "maxvmem" ) :
-  > qacct -j JOBNUM
-  > maxvmem   4.315G
-Also, you can request the info from currently running jobs with qstat ( look for the "usage" information ) :
-  > qstat -j JOBNUM
-  > usage    1:                 cpu=3:07:56:02, mem=36938.10261 GBs, io=15.57702, vmem=769.992M, maxvmem=1.451G
-====== Job status ======
-The current status of a job can be checked with **qstat**. This will return the current list of jobs owned by the user.
-  > qstat
-  job-ID  prior    name       user      state  submit/start at      queue                        slots ja-task-ID
-  ---------------------------------------------------------------------------------------------------------------
-  0.00000  script.sh  deshmukh  r      10/09/2007 08:20:24  users.q@node6.biac.duke.edu      1
-  0.00000  script.sh  deshmukh  qw     10/09/2007 08:20:26                                   1
-Each job listing has the following relevant properties
-| job-id | Unique id assigned by the cluster. |
-| name   | Name of the job. Default value is the name of the script submitted.|
-| user   | Username of person who submitted the job. |
-| state  | Current state of the job. This could be  "r" -> running  or "qw" -> waiting in queue. |
-| submit/start at | Submission time in "qw" state and start time in "r"  state. |
-|queue  | Queue and node on which the job is being run. This field is empty in "qw" state. |
-|slots | Number of processors the job will use. |
-When the job is completed, it will no longer appear in **qstat** listings.
-The status of all jobs owned by users can be checked with **qstatall**
-  > qstatall
-  Running jobs:
-  job-ID  # name         owner    start time          running in
-  -----------------------------------------------------------------------------
-  1 script1.sh   deshmukh 10/09/2007 12:24:01 users.q
-  1 script2.sh   bizzell  10/09/2007 12:24:16 users.q
-====== Job delete ======
-A submitted job can be deleted with **qdel**. It takes the job-id (listed by **qstat**) as its argument.
-  > qdel 9999
-====== Template Script ======
-Jobs are usually written in bash. They are similar to local bash scripts in syntax and usage. In  addition, they contain cluster related directives identified by lines starting with " #$ ". These are used to send job related setup information to the cluster. Scripts also contain requests for access to experiment data. The BIAC template script is a good starting point for testing job submission and as a base script for all jobs. Begin, by making a copy of the template script.
-  cp /usr/local/packages/qsub_templates/basic.sh myscript.sh
-The template script requests access to an experiment folder and lists its contents. It needs a valid BIAC Experiment Name (case-sensitive) that is accessible by the user. Submit myscript.sh using qsub .
-  qsub -v EXPERIMENT=Dummy.01 myscript.sh
-Run **qstat** to check job status. The job will initially be in "qw" state. Wait for a few seconds and run qstat again. The job should be in "r" state. If you don't see a listing, then the job has completed. The results of the job should appear in the experiment folder under Analysis (eg: \\Server\BIAC\Dummy.01\Analysis ) as myscript.sh.xxx.out (xxx is the job id). If you don't see the file, check the experiment name that was provided at submission.
-The script is divided into multiple sections. The user sections are [[biac:cluster:submit#user directive|USER DIRECTIVE]] and [[biac:cluster:submit#user script|USER SCRIPT]].The remaining sections are setup related and don't require modifications for most scripts. They are critical for access to your data.
-==== USER DIRECTIVE ====
-If you want mail notifications when your job is completed or fails you need to set the correct email address. Change the dummy email address (user@somewhere.edu) with the correct email address in the following line.
-  #$ -M user@somewhere.edu
-==== USER SCRIPT ====
-  * Add your script in this section.
-  * Within this section you can access the requested experiment folder using <color navy>$EXPERIMENT</color>. All paths are relative to this variable eg: <color navy>$EXPERIMENT</color>/Data <color navy>$EXPERIMENT</color>/Analysis. The $EXPERIMENT variable is a temporary directory path (assigned for a specific job) that points to the requested experiment directory. Do not use this in place of the actual experiment name (eg: Dummy.01) if its required within your script.
-<code bash>
-  # Correct - lists the contents of the experiment folder
-  ls -l $EXPERIMENT
-  # Correct - lists the contents of the Analysis folder in your experiment directory
-  ls -l $EXPERIMENT/Analysis
-  # Incorrect - The output will be  " My experiment name is /path/to/experiment "
-  # instead of the desired " My experiment name is Dummy.01 "
-  echo "My experiment name is $EXPERIMENT"
-</code>
-  * All terminal output is routed to the " Analysis " folder under the Experiment directory i.e. <color navy>$EXPERIMENT</color>/Analysis. To change this path, set the <color navy>OUTDIR</color> variable at the beginning of this section to another location under your experiment folder.
-<code bash>
-  OUTDIR=$EXPERIMENT/Analysis/ClusterLogs
-</code>
-  * On successful completion the job will return 0. If you need to set another return code, set the <color navy>RETURNCODE</color> variable in this section. To avoid conflict with system return codes, set a <color navy>RETURNCODE</color> higher than 100.
-<code bash>
-  RETURNCODE=110
-</code>
-  * Arguments to the USER SCRIPT are accessible in the usual fashion eg:  <color navy>$1 $2 $3</color>.
-<code bash basic.sh>
-#!/bin/sh
-# --- BEGIN GLOBAL DIRECTIVE --
-#$ -S /bin/sh
-#$ -o $HOME/$JOB_NAME.$JOB_ID.out
-#$ -e $HOME/$JOB_NAME.$JOB_ID.out
-#$ -m ea
-# -- END GLOBAL DIRECTIVE --
-# -- BEGIN PRE-USER --
-#Name of experiment whose data you want to access
-EXPERIMENT=${EXPERIMENT:?"Experiment not provided"}
-EXPERIMENT=`findexp $EXPERIMENT`
-EXPERIMENT=${EXPERIMENT:?"Returned NULL Experiment"}
-if [ $EXPERIMENT = "ERROR" ]
-then
-        exit 32
-else
-#Timestamp
-echo "----JOB [$JOB_NAME.$JOB_ID] START [`date`] on HOST [$HOSTNAME]----"
-# -- END PRE-USER --
-# **********************************************************
-# -- BEGIN USER DIRECTIVE --
-# Send notifications to the following address
-#$ -M user@school.edu
-# -- END USER DIRECTIVE --
-# -- BEGIN USER SCRIPT --
-# User script goes here
-# List all files in the requested Experiment directory
-ls -l $EXPERIMENT
-# -- END USER SCRIPT -- #
-# **********************************************************
-# -- BEGIN POST-USER --
-echo "----JOB [$JOB_NAME.$JOB_ID] STOP [`date`]----"
-OUTDIR=${OUTDIR:-$EXPERIMENT/Analysis}
-mv $HOME/$JOB_NAME.$JOB_ID.out $OUTDIR/$JOB_NAME.$JOB_ID.out
-RETURNCODE=${RETURNCODE:-0}
-exit $RETURNCODE
-fi
-# -- END POST USER--
-</code>
-==== Notes ====
-  * if you ever edit your scripts on a non-unix machine, please run dos2unix on them before submitting
-  * sometimes there are hidden window's characters that will prevent the script from running

Brain Imaging & Analysis Center

User Tools

Site Tools

Differences

Page Tools