Quick Start Guide

Get up and running on the cluster in about 15 minutes.

Tip

This is the express lane. For the long-form version see Getting Started. For the full reference, follow links in each section.

1. Get an account

Before you can log in, three things must happen — the first is the slowest, so start now:

  1. Your PI emails help-hpc@caltech.edu asking to add you to their group. (New groups: see Account Information.)

  2. Set up Multi-Factor Authentication at access.caltech.edu/my_duo. Duo Mobile on your phone is the easiest option.

  3. Complete eligibility certification at access.caltech.edu/hpc_portal. This is the export-control acknowledgment — see Policies.

You’ll get a confirmation email when your account is ready.

2. Connect via SSH

Important

You must be on the Caltech network — either physically on-campus or connected to Caltech VPN. SSH attempts from anywhere else will hang or be refused. See Common Problems → Connection Refused if the login host is unreachable.

Open a terminal and connect:

ssh username@login.hpc.caltech.edu

You’ll be prompted for your access.caltech password and a Duo push.

Use the built-in OpenSSH client from PowerShell or Windows Terminal:

ssh username@login.hpc.caltech.edu

Or use MobaXterm for a graphical SSH/SFTP client.

Use Open OnDemand — a browser-based interface for shells, file management, and interactive apps (Jupyter, RStudio, MATLAB).

Important

You must SSH at least once before using Open OnDemand — the first SSH login creates your home directory.

3. Move some data over

# Upload a single file
scp myfile.txt hpc:~/

# Upload a directory
scp -r mydata/ hpc:/resnick/groups/yourgroup/$USER/

# Download results
scp hpc:/resnick/scratch/$USER/results.txt ./

For large transfers (> ~100 GB), use Globus instead — see Transferring Files.

4. Find and load software

module avail              # List everything available
module spider pytorch     # Search for a specific package
module load python3/3.10.12
module list               # Show what's loaded

If something’s missing, request it via help-hpc@caltech.edu — or install it yourself with conda or pip.

5. Submit your first job

Save the following as hello.sh:

#!/bin/bash
#SBATCH --job-name=hello
#SBATCH --output=hello-%j.out
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=4G
#SBATCH --time=00:05:00

echo "Hello from $(hostname) at $(date)"
echo "Allocated $SLURM_CPUS_PER_TASK CPUs and $SLURM_MEM_PER_NODE MB RAM"
sleep 30
echo "Done."

Submit, watch, and inspect:

sbatch hello.sh           # → "Submitted batch job 12345"
squeue -u $USER           # Watch it run
cat hello-12345.out       # See the output
seff 12345                # Check how efficient the request was

Tip

seff tells you how much of your requested CPU/memory you actually used. Right-sizing your next job by that ratio is the single biggest thing you can do to improve queue times.

6. Cheat sheet

Command

What it does

sbatch script.sh

Submit a batch job

squeue -u $USER

Show your jobs

scancel JOBID

Cancel a job

scancel -u $USER

Cancel all your jobs

srun --pty bash

Quick interactive shell on a compute node

salloc -N 1 -t 1:00:00

Reserve resources for an interactive session

seff JOBID

Efficiency report after a job finishes

sacct -j JOBID

Detailed job accounting

module avail

List installed software

hpcquota

Check storage usage

sinfo -p gpu

Show GPU partition state

7. Where to put your files

Location

Quota

Backed up?

Use for

/home/$USER

50 GB

No

Scripts, configs, dotfiles

/resnick/groups/<group>

20 TB

No

Project data, results to keep

/resnick/scratch

Large, shared

No

Active computations, temp files

Warning

Nothing on the cluster is backed up. Files on /resnick/scratch are purged after 14 days without access. Move important results to group storage and copy critical data offsite — see Backups.

Next steps

Example Job Scripts

Copy-paste templates: serial, MPI, GPU, arrays, MATLAB, R, GROMACS, AlphaFold

Example Job Scripts
GPU Computing

H100/H200, CUDA, deep learning

GPU Computing
AI/ML Guide

PyTorch, TensorFlow, LLMs, distributed training

AI & Machine Learning
Best Practices

Right-sizing requests, checkpointing, I/O patterns

Best Practices

Stuck?