Glossary
Common HPC terms and their definitions.
A
- Allocation
A grant of computing resources (CPU hours, storage) to a research group.
- Apptainer
Container runtime for HPC (formerly Singularity). Allows running containerized applications without root privileges.
B
- Batch Job
A job submitted to the scheduler to run without user interaction. Contrast with interactive job.
C
- Cluster
A collection of interconnected computers (nodes) that work together as a single system.
- Compute Node
A server in the cluster dedicated to running user jobs. Users cannot log in directly; jobs are assigned by the scheduler.
- Container
A lightweight, standalone package containing code and dependencies. See Apptainer, Docker.
- Core
A single processing unit within a CPU. Modern CPUs have multiple cores (e.g., 64 cores per CPU).
- CPU (Central Processing Unit)
The main processor in a computer. HPC nodes typically have 1-2 CPUs with many cores each.
D
- Docker
Popular container platform. Docker images can be converted to Apptainer format for HPC use.
E
- Environment Module
System for managing software packages. Users load modules to access specific software versions.
F
- Fairshare
Scheduling policy that balances resource allocation based on historical usage. Groups that recently used many resources get lower priority.
- FLOPS
Floating-point Operations Per Second. Measure of computing performance.
G
- Globus
High-performance file transfer service for moving large datasets between systems.
- GPU (Graphics Processing Unit)
Specialized processor for parallel computations. Essential for deep learning and certain scientific simulations.
H
- Head Node
Server that manages the cluster, runs the scheduler, and routes network traffic.
- HPC (High Performance Computing)
Using supercomputers and clusters to solve complex computational problems.
- HTC (High Throughput Computing)
Running many independent jobs, optimizing for total work completed rather than individual job speed.
I
- InfiniBand
High-speed, low-latency network interconnect used between cluster nodes. Much faster than Ethernet.
- Interactive Job
A job where the user directly interacts with the compute node via terminal. Contrast with batch job.
J
- Job
A unit of work submitted to the cluster scheduler.
- Job Array
A collection of similar jobs submitted as a single entity, each with a unique index.
- Job Script
A shell script containing SLURM directives and commands to execute.
L
- Login Node
Server where users connect via SSH. Used for file editing, job submission, and light tasks. Not for heavy computation.
M
- MFA (Multi-Factor Authentication)
Security requiring multiple verification methods (password + Duo app).
- Module
See Environment Module.
- MPI (Message Passing Interface)
Standard for parallel programming across multiple nodes. Programs use MPI to communicate between processes.
N
- Node
A single computer/server in the cluster.
- NUMA (Non-Uniform Memory Access)
Memory architecture where access time depends on memory location relative to the processor.
O
- OOD (Open OnDemand)
Web-based interface for accessing HPC resources via browser.
- OpenMP
API for shared-memory parallel programming. Programs use OpenMP for multi-threaded parallelism within a single node.
P
- Parallel Computing
Running computations simultaneously across multiple cores, GPUs, or nodes.
- Partition
A logical grouping of nodes in SLURM (e.g.,
gpupartition for GPU nodes).- PI (Principal Investigator)
Faculty member leading a research group. PIs authorize group membership and manage allocations.
- PTA
Project-Task-Award. Caltech’s accounting code for charging compute usage.
Q
- QOS (Quality of Service)
SLURM setting that modifies job priority or limits. Example:
debugQOS for quick test jobs.- Queue
List of jobs waiting to run. Jobs are selected based on priority and resource availability.
- Quota
Storage limit (e.g., 50 GB home directory quota).
S
- SBATCH
SLURM command to submit a batch job.
- Scheduler
Software that manages job queue and allocates resources. Caltech uses SLURM.
- Scratch
Temporary high-performance storage for active computations. Files are automatically deleted after 14 days.
- SIF (Singularity Image Format)
Container image format used by Apptainer/Singularity.
- SLURM (Simple Linux Utility for Resource Management)
The job scheduler and resource manager used on the cluster.
- SSH (Secure Shell)
Protocol for secure remote login and command execution.
- SRUN
SLURM command to run parallel tasks within a job.
T
- Task
A single process in a SLURM job. A job can have multiple tasks.
- Thread
A lightweight execution unit within a process. Multi-threaded programs run multiple threads per core.
V
- VAST
High-performance storage system. Current primary storage at Caltech HPC.
- VPN (Virtual Private Network)
Secure connection to campus network from off-campus locations.
W
- Walltime
Maximum allowed runtime for a job. Jobs exceeding walltime are terminated.
- Worker Node
See Compute Node.
Acronym Reference
Acronym |
Meaning |
|---|---|
CPU |
Central Processing Unit |
GPU |
Graphics Processing Unit |
HPC |
High Performance Computing |
HTC |
High Throughput Computing |
MFA |
Multi-Factor Authentication |
MPI |
Message Passing Interface |
OOD |
Open OnDemand |
PI |
Principal Investigator |
QOS |
Quality of Service |
SIF |
Singularity Image Format |
SLURM |
Simple Linux Utility for Resource Management |
SSH |
Secure Shell |
VPN |
Virtual Private Network |