Job Submission Limits

Maximum Concurrent Jobs

The cluster allows a user to submit up to 10,000 jobs at one time.

Why This Limit Exists

This limitation exists to avoid hanging the cluster scheduling system. Very large numbers of jobs can overwhelm the SLURM scheduler.

Workarounds

If you need to exceed this threshold:

Manual Batching

Submit jobs in batches of 10,000, waiting for some to complete before submitting more.

Custom Job Management

Implement a wrapper script that monitors your job count and submits new jobs as others complete:

#!/bin/bash
MAX_JOBS=9000
TOTAL_JOBS=50000

for i in $(seq 1 $TOTAL_JOBS); do
    # Wait if too many jobs queued
    while [ $(squeue -u $USER -h | wc -l) -ge $MAX_JOBS ]; do
        sleep 60
    done

    sbatch my_job.sh $i
done

Job Arrays

For similar jobs, use SLURM job arrays (more efficient):

#SBATCH --array=1-10000

Then submit another array after the first completes.