SLURM Scheduler – Job Priority and Wait Time

This article will help you understand the factors SLURM uses to set the priority of your job(s) and ultimately help you minimize wait times.

Wait Time

As a general rule, we do not guarantee short wait times for any group. Our cluster is configured to balance optimizing the utilization of our HPC resources and providing resources to all of our research community. We encourage users to plan ahead to account for the wait time.

Factors which affect your job’s wait time:

The cluster operates almost entirely according to a first-in-first-out (FIFO) queue, which means that jobs that request similar hardware will be scheduled in the order they are submitted. The job priority therefore increases linearly with the wait time (age) until it starts.
The size, hardware, and partition (queue) will affect your wait time, particularly since demand fluctuates significantly. A very large job will wait longer for resources than a small job. While SLURM reserves hardware for a large job, it attempts to schedule small jobs as “backfill” as long as they can finish before the large job is ready to start. Therefore, users can optimize their wait times by correctly estimating the amount of time they need. Keep in mind that if you underestimate the necessary time, your SLURM job will be terminated prematurely of your calculations being completed.
Labs who purchased hardware within the last several years have priority built into their queues. When their jobs are submitted to the lab’s priority queue, they will receive the next job opening on the node(s) in their queue.

Job Priority Factors

The job’s priority at any given time will be a weighted sum of all the factors that have been enabled in the SLURM scheduler configuration.

All of the factors in the below Job_priority formula are floating point numbers that range from 0.0 to 1.0. The weights are unsigned, 32 bit integers. The job’s priority is an integer that ranges between 0 and 4294967295. The larger the number, the higher the job will be positioned in the queue, and the sooner the job will be scheduled. A job’s priority, and hence its order in the queue, can vary over time. For example, the longer a job sits in the queue, the higher its priority will grow when the age_weight is non-zero.

Job_priority =
	site_factor +
	(PriorityWeightAge) * (age_factor) +
	(PriorityWeightAssoc) * (assoc_factor) +
	(PriorityWeightFairshare) * (fair-share_factor) +
	(PriorityWeightJobSize) * (job_size_factor) +
	(PriorityWeightPartition) * (partition_factor) +
	(PriorityWeightQOS) * (QOS_factor) +
	SUM(TRES_weight_cpu * TRES_factor_cpu,
	    TRES_weight_ * TRES_factor_,
	    ...)
	- nice_factor

The Job priority values for the Mind cluster are set to the following (as of 1/14/2022, when this article is being written):

PriorityType=priority/multifactor
PriorityDecayHalfLife=14-0
PriorityWeightFairshare=10000
PriorityWeightAge=10
PriorityWeightPartition=1000
PriorityWeightJobSize=1000
PriorityMaxAge=1-0
PriorityWeightQOS=0
PriorityWeightTRES=cpu=2000,mem=1,gres/gpu=400
AccountingStorageTRES=gres/gpu
FairShareDampeningFactor=5

The PriorityType=priority/multifactor setting provides a versatile method of ordering the queue of jobs waiting to be scheduled based on nine factors.

Age: the length of time a job has been waiting in the queue, eligible to be scheduled
Association: a factor associated with each association
Fair-share: the difference between the portion of the computing resource that has been promised and the amount of resources that has been consumed
Job size: the number of nodes or CPUs a job is allocated
Nice: a factor that can be controlled by users to prioritize their own jobs.
Partition: a factor associated with each node partition
Quality of Service (QOS): a factor associated with each Quality Of Service
Site: a factor dictated by an administrator or a site-developed job_submit or site_factor plugin
TRES: each TRES Type has its own factor for a job which represents the number of requested/allocated TRES Type in a given partition.

The weight that is assigned to the above factors to enact a policy that blends a combination of any of the above factors. Using the policy fair-share to be the dominant factor (say 70%), set the job size and the age factors to each contribute 15%, and set the partition and QOS influences to zero.

The sprio utility

The sprio command provides a summary of the six factors that comprise each job’s scheduling priority. While squeue has format options (%p and %Q) that display a job’s composite priority, sprio can be used to display a breakdown of the priority components for each job. In addition, the sprio -w option displays the weights (PriorityWeightAge, PriorityWeightFairshare, etc.) for each factor as it is currently configured.

[dpane@mind ~]$ sprio -w
          JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE    JOBSIZE  PARTITION                 TRES
        Weights                               1         10      10000       1000       1000 cpu=2000,mem=1,gres/

You can see the job priority of a job using the command sprio -j JobId:

[dpane@mind ~]$ sprio -j 1084880
          JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE    JOBSIZE  PARTITION                 TRES
        1084880 tarrq           1090          0          8          2         18       1000         cpu=62,mem=0

Updated on January 14, 2022

Was this article helpful?

Yes No

Need Help?

Can't find the answer you're looking for?

Contact NI Support

Wait Time

Job Priority Factors

The sprio utility

Related Articles