ERP Cluster
Specifications¶
The cluster consists of 14 nodes connected via Infiniband. Each node has two 24-core 2.3 GHz AMD EPYC 7451 processors and 128GB of system memory. erp13 and erp14 are higher memory nodes with 512GB of memory each.
Performance¶
An ERP node with two AMD EPYC 7451 can theoretically perform
2 sockets * 24 cores/socket * 8 flops/cycle/core * 2.9 Giga-cycles/second = 1.1 TeraFLOPs double precision
using the 8 flops/cycle/core rate described here and a 2.9Ghz all core boost clock rate.
Each ERP node achieves 200GB/s of memory bandwidth running the stream triad benchmark. This result is from testing on a similar system.
Documentation¶
The following materials provide cluster users with details on architecture and performance tuning.
Dell EPYC performance study - discusses NUMA effects and socket locality to network interface
Accessing the System¶
Note: Not all projects have access to the cluster. Job submissions to Slurm may be rejected even if access to the front-end node is authorized.
Running on the cluster first requires connecting to one of its front end nodes erpfen01. These machines are accessible from the landing pads.
Compiling¶
The erpfen01
front end node is virtualized and is not setup for
compiling for the compute nodes. Please compile on either erp01
or allocate a himem
node:
salloc -p himem -N 1 -t 30
then ssh to the allocated node to build your software.
Modules¶
On erp01
or a himem
node modules are loaded from the following path:
/gpfs/u/software/erp-spack-install/lmod/linux-centos7-x86_64/Core/
If running module av
does not list any modules from this location, running the following command will fix that:
module use /gpfs/u/software/erp-spack-install/lmod/linux-centos7-x86_64/Core
See /gpfs/u/software/erp-spack-install/README
for the latest available module sets using specific compiler and MPI combinations.
SMT¶
Currently, SMT is not enabled on cluster nodes. By default Slurm will assign 48 processes to each node to fit the 48 cores available.
Submitting and Managing Jobs¶
The ERP cluster, unlike other clusters at CCI, uses consumable resources. This schedules CPU cores and memory independently rather than assuming a job will utilize whole nodes. Users should make note of the following when submitting jobs:
- Jobs should include a per-node memory constraint (
--mem
) or a per-CPU memory constraint (--mem-per-cpu
) that will be sufficient for the highest demand process. (Non-homogeneous jobs may have different memory requirements across the job. The highest requirement should be used as the constraint.) - Processes that utilize threads will need CPU count constraints. By
default the allocation method is 1 CPU per process or task. Jobs
that require multiple CPUs per process or task (for threads, OpenMP,
etc) should add a
--cpus-per-task
constraint to allocate additional CPU cores. Explicit binding may also be necessary depending on the application (see--cpu_bind
in the srun man page). - Parallel job arrays will require some additional parameters to subdivide resources assigned to a job. See the Job arrays/Many small jobs section for more information.
Partitions¶
Name | Time Limit (hr) | Max Nodes |
debug | 1 | 14 |
erp | 6 | 12 |
himem | 4 | 2 |
Example job submission scripts¶
Please see Slurm for more info.
The work distribution, communication patterns, and programming model will guide the selection for a process binding. Users seeking maximum performance or efficiency should review the Slurm and EPYC architecture documentation to determine a suitable process binding.
Exclusive node allocation, MPI (OpenMPI) Only¶
The following job script and submission command will allocate all the
resources on each of
Create a file named run.sh with the following contents:
#!/bin/bash
module use /gpfs/u/software/erp-spack-install/lmod/linux-centos7-x86_64/Core/
module load gcc
module load openmpi
bindArg="--cpu_bind=verbose,map_ldom=4,4,4,4,4,4,5,5,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7,7,0,0,0,0,0,0,1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3"
srun --mpi=pmi2 ${bindArg} /path/to/executable
Submission command:
sbatch -p
-t
-N
-n
--mincpus=48 ./run.sh