Frequently Asked Questions
Account & Access
Cluster IP Space - IPs to whitelist on remote firewalls
Job Management
Software & Containers
Troubleshooting
Quick Answers
How do I login to the cluster?
ssh username@login.hpc.caltech.edu
How do external collaborators get an account?
Contact your PI to request access through help-hpc@caltech.edu.
How should I acknowledge research on the cluster?
Add the following to all resulting publications and presentations:
The computations presented here were conducted in the Resnick High Performance Computing Center, a facility supported by Resnick Sustainability Institute at the California Institute of Technology.
Why won’t my job start?
Common reasons:
Resources unavailable - Requested resources exceed availability
Queue depth - Many jobs ahead of yours
Fairshare - Your group has used significant resources recently
Check with: squeue -u $USER and scontrol show job JOBID
How do I get job information via email?
Add to your SLURM script:
#SBATCH [email protected]
#SBATCH --mail-type=BEGIN,END,FAIL
How do I modify my bash environment?
Edit ~/.bashrc for interactive shells or ~/.bash_profile for login shells.
How do I compress unused data?
tar -czvf archive.tar.gz directory/
Using the debug QOS
For quick tests (up to 30 minutes):
#SBATCH --qos=debug
#SBATCH --time=00:30:00
I have a deadline and need my job to run now!
Contact help-hpc@caltech.edu to discuss options. See Reservations.
I need to run longer than 7 days
Contact help-hpc@caltech.edu to discuss extended walltime options.
Dependencies and pipelines
Use SLURM job dependencies:
# Submit job that waits for job 12345
sbatch --dependency=afterok:12345 next_job.sh
How do I checkpoint before my job hits its walltime?
Ask SLURM to send a signal a fixed number of seconds before the walltime, then trap it in your script to save state:
#SBATCH --signal=B:SIGTERM@120 # send SIGTERM 120s before the time limit
trap 'echo "saving checkpoint..."; ./save_state.sh; exit 0' SIGTERM
./long_running_program &
wait
The B: prefix sends the signal to the batch script itself rather than the job steps.
How do I check my group’s compute usage?
# Your own recent jobs
sacct -u $USER --starttime=2026-01-01 -o JobID,Elapsed,AllocCPUS,State
# Group-level usage over a period
sreport cluster AccountUtilizationByUser start=2026-01-01