Fairshare & Job Priority

The scheduler doesn’t start pending jobs first-come, first-served — it orders them by priority, and the largest factor in that priority is fairshare between groups (SLURM’s docs sometimes write it fair-share).

How fairshare works

Each group gets a slice of the cluster — its shares — based on the number of users in the group and the group’s investment in the cluster. The scheduler then compares how much the group has actually been using against that slice:

If the group’s usage is above its shares, its priority goes down.
If the group’s usage is below its shares, its priority goes up.

Fairshare is calculated per group, not per user. If your group has a lot of active users, you share one priority pool with them — so when the group is busy, you can end up competing with your own labmates to get jobs running.

Check your group’s shares and usage

Use sshare to see where your group stands:

sshare -A groupname -o account,normshare,normusage,levelfs

Account              NormShares   NormUsage     LevelFS
-------------------- ----------- ----------- ----------
groupname               0.048056    0.328213    0.146416

NormShares — the fraction of the cluster your group is entitled to.
NormUsage — the fraction your group has actually been using.
LevelFS — the ratio of shares to usage. Above 1 means the group is under-utilizing its share, so its priority goes up; below 1 means it’s over-utilizing, so its priority goes down.

In the example above, usage (0.328213) is well above the group’s shares (0.048056), so LevelFS is 0.146416 — below 1, and this group’s priority is being pushed down.

See where your jobs sit in the queue

sprio lists pending jobs sorted by priority, along with the components that make up each job’s score. For the GPU partition:

sprio -S y -l -p gpu

What raises your priority

Lower group usage — the less your group has run recently, the higher its fairshare priority climbs.
Investment — groups that invest in the cluster receive a larger share.
Queue wait time — a job’s priority also rises the longer it waits in the queue. The longer a job sits pending, the higher its priority climbs, until it runs or the aging contribution maxes out.