Resource management tools

This blog mainly discuss about some tips for commonly used tools to allocated resources for distrirbuted system and parallel computing.

SLURM

The slurm is popular workload manager for HPC cluster. Here is detailed document.

User can specify how many nodes they want, how many cores and memory etc.

The idea is to submit job into associated queues and the manager will allocate resources according to the job information.

Here are some tips

search queue info

squeue -p gpu

cancel job

scancel + jobid

submit job

sbatch <job name>

show quota

$showquota

scontrol

scontrol show hostname command, this can show all host names of a running job, if the program need to access the allocated nodes, we can use this command. For some program that requires the service registration for the master process, this is really helpful. Since the ip address changes each time when there is new allocation.

commonly used parameters for srun

--nodes specify how many nodes will be used, --constraint can specify the specific type of the node (such as using specific cpu architecture, these information is depends on the cluster’s properties) -c can specify how many cores will be used for each task (ranks) and -n specifies how many taks (ranks).

Other tips

Each srun will specify a new node, it is important to specify the memory associated with each srun command (otherwise, it assumes to use all the memory associated with each node)

MPIRUN

This question discusses some differences between the srun and the mpirun. From my understanding, the srun is prefered compared with using the mpirun direactly.

Here is a exmaple for how to set multi openmp thread for mpi run

mpirun -n 1 --bind-to none -x OMP_NUM_THREADS=1 ...

Otherwise, if we use the mpirun -n <executable> without specifying the openmp thread, it is possible that each process use multiple threads or oversubscribe the number of cores. The good practice is always to set the export OMP_NUM_THREADS=<numbers> explicitly. It depends on the operating system about how the threads are scheduled.

LSF

This is also designed for the HPC cluster, this is designed by the IBM. The summit cluster use this scheduler.

One main difference between the LSF and the SLURM is that the LSF use the concepts of resource set, user can not control the node, they just control how many resource set they will need.

basic control

bsub <job name> can submit the job

bkill <job id> can kill the job

jsrun can be used to start the specific process. In the job script, we use the jsrun to start the program.

bjobs -l <jobid> can show the details for the jobs

jslist -c <allocation id> can show the details resources of the allocation. That allocation id is printed out by the bjobs info.

This document contains some detailed information about running the job. And there are also several specific use cases, such as how to fully use all allocated resources.

This document shows how to run multiple jsrun at the same time.

Illustrating things by that topological figure of HPC computing node is important.

One special concept of the LSF is the resource set, this is similar to the concept of the container which provide a good resource isolation view and user can orgaznize reosuces flexibaly in one node. The good thing is that when we use the node with multiple gpu, using resource isolation can decrease the dificult for programming. The program just need to control one gpu instead of multiple gpu at the same time.

For the experiment that need to use more than 1 node, the good way to determine how many resource sets is to use the gpu as the indicator, for example on summit node, there is 6 gpu per node, so try to use 6 resource sets, since there are 42 cores, when than assign 42/6 = 7 cores for each resource set. And one task per resource set can be easy to manage. Then just need to adjust the number of resource sets when change use different number of processes.

-n 1 -a 1 -c 7 -b packed:7 -g 1

Another way is to use cpu as the distinction, for example, 1 resource set per node and use all cpu here by 1 MPI program.

-n 1 -a 1 -c 42 -b packed:42 -g 0

This common use cases and jsun example can show all details.

This document lists all details of the spectrum things. These commands are really similar to the K8s things. The slurm can also provide similar functionalities.

bparams -a this can show all setting for the LSF, maybe some capabilities are disabled such as elasticity jobs.

Docker and k8s

Docker and k8s are mainsream manager for managing the cluster on the cloud env. We do not list details here, there are all kinds of resources online, the idea is similar with the slurm and lsf, we can allocate the job and run specific program based on docker. There is better support for the web application.

The k8s things is still very popular, if you have a good understanding about it, you may get a decent job offer from the high tech company. However, it only focuse on the resource management layer, which is not much difference in concept compared with the slurm and lsf. Or you may call it PaaS in the scope of cloud computing.

Test program

One tricky thing is that sometimes you are not make sure if the scheduler actually works as you expected.

Here are some examples for testing:

MPI+Openmp

https://github.com/wangzhezhe/parallelBLK/tree/master/benchmark/mpi_openmp

MPI+Cuda

This is a exmple for testing the run on ornl cluster

https://code.ornl.gov/t4p/Hello_jsrun

推荐文章