Search open search form

Relion using SBGRID

Slurm Relion Docs

Slurm Relion Docs

Submitting Slurm jobs via the Relion interface


Source the SBGRID software compilation and export required Relion variables by adding the following to your ~.bashrc. Be sure to either logout and back in or source the ~.bashrc after modification.


# Source SBGrid (Note, sourcing sbgrid in both bashrc and the Relion job submission template caused job failures)
source /programs/sbgrid.shrc

# RELION VARIABLES (these will show up in the Relion GUI)
export RELION_QSUB_EXTRA_COUNT=3
export RELION_QSUB_EXTRA1="Walltime in hours"
export RELION_QSUB_EXTRA2="Mem per CPU"
export RELION_QSUB_EXTRA3="GRES"


Create default SBGRID preferences


Default software packages for SBGRID can be changed by creating and editing ~/.sbgrid.conf . Normally the defaults should work fine but changes can be made if required.


#RELION_X=3.1.0_cu10.2
#CUDA_X=10.2
#OPENMPI_X=3.1.6_slurm-20.02.03
#PYTHON_X=3.7.0


Create the Relion Slurm submission template


The submission template will be used to generate the actual Slurm submission script on a per-job basis. Be sure this script is being referenced in the Relion GUI. 


#!/bin/bash

#SBATCH --partition=any
#SBATCH --nodes=1   # number of nodes
#SBATCH --ntasks=XXXmpinodesXXX
#SBATCH --cpus-per-task=XXXthreadsXXX
#SBATCH --mem-per-cpu=XXXextra2XXX
#SBATCH -t XXXextra1XXX:00:00
#SBATCH --gres=XXXextra3XXX
#SBATCH --error=XXXerrfileXXX
#SBATCH --output=XXXoutfileXXX
 
# LOAD MODULES, INSERT CODE, AND RUN YOUR PROGRAMS HERE
#Uncomment the cuda/opemmpi below if you need it.
#module load cuda/11.2 openmpi/4.1.1_cuda-11.2
#INFO
echo "Starting at `date`"
echo "Running on hosts:$SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."

# Export shell variables to run environment. This will export the sbgrid environment that was originally sourced in the ~/.bashrc
#SBATCH --export=ALL
 
# RUN
srun --export=ALL --mpi=pmi2 XXXcommandXXX


Troubleshooting

Error: An ORTE daemon has unexpectedly failed after launch and before

communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. 

Solution: Try using srun rather than mpirun/mpiexec in Relion's job template. After updating Relion's job template, be sure to verify those changes are picked up during the job run by looking at the log files associated with the newly run job. 

Other useful SBGRID commands


Verify information and version of an active package


sbgrid-info -l relion