System Status

Operational state, scheduled maintenance, and how to check the cluster yourself.

Important

For real-time outage alerts, subscribe to the central-hpc-users@caltech.edu mailing list. For incidents in progress, check email and the announcements section below before submitting a ticket.

Current Status

System

Status

Notes

Login Nodes (login.hpc.caltech.edu)

Operational

Round-robin to login3/login4

Compute Partitions (expansion, any)

Operational

GPU Partition (gpu)

Operational

H100/H200/V100/P100/L40s

Open OnDemand (interactive.hpc.caltech.edu)

Operational

Browser-based shells, file management, GUI apps

VAST Storage (/resnick)

Operational

Group, home, and scratch storage

Globus Endpoint

Operational

Endpoint name: caltech#hpc

This table is updated manually after each maintenance window. For the live picture run a sinfo/squeue from the login node, or use the dashboards on the landing page.

Active Announcements

Check the Cluster Yourself

Job queues

# Total jobs across the cluster
squeue | wc -l

# Pending jobs in the GPU partition
squeue -p gpu --state=pending | wc -l

# Your jobs only
squeue -u $USER

# Your pending jobs with start-time estimates
squeue -u $USER --state=pending --start

Node availability

# Compact summary by partition + state
sinfo -s

# Idle nodes you could land on right now
sinfo --states=idle

# GPU partition with per-node detail
sinfo -p gpu -N -O nodelist,partition,statelong,gres

Storage health

# Your quota across home + group + scratch
hpcquota

# Scratch quota (separate)
hpcquota -s

# Filesystem-level usage
df -h /resnick

Job efficiency after a run

seff JOBID                                     # CPU + memory utilisation
sacct -j JOBID -o JobID,Elapsed,MaxRSS,State   # Detailed accounting

Notifications

Mailing List

Major outages and maintenance announcements are sent through the central-hpc-users@caltech.edu mailing list, which users are automatically subscribed to when they join an existing HPC group.

Reporting an Issue

  1. Check this page first — your issue may already be acknowledged.

  2. Capture the basics: username, job ID(s), partition, the exact command you ran, and the error message.

  3. File the report — email help-hpc@caltech.edu, or open a ticket in the Caltech Help System (ServiceNow).

See also

Troubleshooting for symptom-based diagnosis. Common Problems for known fixes.