System Status
Operational state, scheduled maintenance, and how to check the cluster yourself.
Important
For real-time outage alerts, subscribe to the central-hpc-users@caltech.edu mailing list. For incidents in progress, check email and the announcements section below before submitting a ticket.
Current Status
System |
Status |
Notes |
|---|---|---|
Login Nodes ( |
Operational |
Round-robin to login3/login4 |
Compute Partitions ( |
Operational |
|
GPU Partition ( |
Operational |
H100/H200/V100/P100/L40s |
Open OnDemand ( |
Operational |
Browser-based shells, file management, GUI apps |
VAST Storage ( |
Operational |
Group, home, and scratch storage |
Globus Endpoint |
Operational |
Endpoint name: |
This table is updated manually after each maintenance window. For the live picture run a sinfo/squeue from the login node, or use the dashboards on the landing page.
Active Announcements
Note
No active announcements.
When there’s an outage in progress this is the section to watch — the table above is updated as conditions change.
Check the Cluster Yourself
Job queues
# Total jobs across the cluster
squeue | wc -l
# Pending jobs in the GPU partition
squeue -p gpu --state=pending | wc -l
# Your jobs only
squeue -u $USER
# Your pending jobs with start-time estimates
squeue -u $USER --state=pending --start
Node availability
# Compact summary by partition + state
sinfo -s
# Idle nodes you could land on right now
sinfo --states=idle
# GPU partition with per-node detail
sinfo -p gpu -N -O nodelist,partition,statelong,gres
Storage health
# Your quota across home + group + scratch
hpcquota
# Scratch quota (separate)
hpcquota -s
# Filesystem-level usage
df -h /resnick
Job efficiency after a run
seff JOBID # CPU + memory utilisation
sacct -j JOBID -o JobID,Elapsed,MaxRSS,State # Detailed accounting
Notifications
Mailing List
Major outages and maintenance announcements are sent through the central-hpc-users@caltech.edu mailing list, which users are automatically subscribed to when they join an existing HPC group.
Reporting an Issue
Check this page first — your issue may already be acknowledged.
Capture the basics: username, job ID(s), partition, the exact command you ran, and the error message.
File the report — email help-hpc@caltech.edu, or open a ticket in the Caltech Help System (ServiceNow).
See also
Troubleshooting for symptom-based diagnosis. Common Problems for known fixes.