Fairshare & Job Priority
The scheduler doesn’t start pending jobs first-come, first-served — it orders them by priority, and the largest factor in that priority is fairshare between groups (SLURM’s docs sometimes write it fair-share).
How fairshare works
Each group gets a slice of the cluster — its shares — based on the number of users in the group and the group’s investment in the cluster. The scheduler then compares how much the group has actually been using against that slice:
If the group’s usage is above its shares, its priority goes down.
If the group’s usage is below its shares, its priority goes up.
Fairshare is calculated per group, not per user. If your group has a lot of active users, you share one priority pool with them — so when the group is busy, you can end up competing with your own labmates to get jobs running.
Check your group’s shares and usage
Use sshare to see where your group stands:
sshare -A groupname -o account,normshare,normusage,levelfs
Account NormShares NormUsage LevelFS
-------------------- ----------- ----------- ----------
groupname 0.048056 0.328213 0.146416
NormShares — the fraction of the cluster your group is entitled to.
NormUsage — the fraction your group has actually been using.
LevelFS — the ratio of shares to usage. Above 1 means the group is under-utilizing its share, so its priority goes up; below 1 means it’s over-utilizing, so its priority goes down.
In the example above, usage (0.328213) is well above the group’s shares (0.048056), so
LevelFS is 0.146416 — below 1, and this group’s priority is being pushed down.
See where your jobs sit in the queue
sprio lists pending jobs sorted by priority, along with the components that make up each
job’s score. For the GPU partition:
sprio -S y -l -p gpu
What raises your priority
Lower group usage — the less your group has run recently, the higher its fairshare priority climbs.
Investment — groups that invest in the cluster receive a larger share.
Queue wait time — a job’s priority also rises the longer it waits in the queue. The longer a job sits pending, the higher its priority climbs, until it runs or the aging contribution maxes out.
See also
SLURM Commands for the full scheduler command reference, and Best Practices for keeping jobs efficient so you use less of your group’s share.