LAMMPS

The following documented steps have been confirmed to work with the system-installed GCC 8.4.1 as well as the xl_r/16.1.1 module on the DCS platform as of August 5th, 2021.

Barebones LAMMPS Installation Steps¶

On DCS Front End Node:

1) Load Modules

#If you are using the IBM XL_r compiler load this module, otherwise skip (e.g. using system GCC) 
module load xl_r

#Load these for both xl_r and GCC
module load cmake
module load spectrum-mpi
module load cuda

2) Clone LAMMPS

mkdir ~/scratch/lammps-dev
cd ~/scratch/lammps-dev
git clone https://github.com/lammps/lammps.git

3) Install LAMMPS

mkdir build
cd build

cmake ../cmake -D CMAKE_BUILD_TYPE=Release -D BUILD_MPI=yes -D BUILD_OMP=no -D LAMMPS_MACHINE=AiMOS -D CMAKE_CXX_FLAGS="-O3 -std=c++11" -D PKG_GPU=yes -D GPU_API=cuda -D GPU_PREC=mixed -D GPU_ARCH=sm_70

make -j 8

4) Test Run:

Request interactive slurm session:

salloc -N 1 --gres=gpu:6 -t 180

ssh into the reserved node and execute the following:

cd ~/scratch/lammps-dev/lammps
mpirun -np 40 ./build/lmp_AiMOS -sf gpu -v x 10 -v y 10 -v z 10 -v t 200 < ./bench/in.lj

Offline Instalation with External Dependencies¶

Some cmake presets for LAMMPS will attempt to download and install third party dependencies if they are not found. This can be a problem on CCI systems due to the security firewall. Fortunately, LAMMPS provides a pathway to pre-download these dependencies on your local machine instead.

IMPORTANT NOTE: Eigen Dependency Bug¶

The eigen dependency -- which is brought in from the most.cmake preset -- has a known bug when compiling using the IBM XL C/C++ compiler. This is something that we do not have control over and discussion of the bug can be found at the following Eigen GitLab issue: https://gitlab.com/libeigen/eigen/-/issues/1555. LAMMPS can be successfully built using DCS's GCC compiler with just some additional steps.

Other Notes¶

1) CCI Firewall As mentioned above on the specific cmake preset that you are trying to use for the building of LAMMPS, there will be various third party dependencies that LAMMPS will attempt to download and install. Due to the CCI security firewall. Unless you are attempting to download something from a pre-approved list of trusted domains via the Proxy (https://secure.cci.rpi.edu/wiki/landingpads/Proxy/), any http(s) requests will be blocked. Many of the third-party dependencies that LAMMPS attempts to download are from places that are not on the shortlist of pre-approved domains. This means that we need to set up LAMMPS dependencies in an "offline" mode. Fortunately, LAMMPS provides scripts to make this possible.

2) CCI AiMOS System Python version In order to utilize the above mentioned offline, we need to utilize the scripts found in 'lammps/tools/offline'. Specifically, on AiMOS, we will be using the "use_caches.sh" script which will configure your shell to look at a specific location for the needed dependencies' source tarballs (.tar.gz zip archives) instead of looking to download them from the internet. For a few dependencies, this involves setting up a python http server that intercepts http(s) requests and redirects them to the specific location on AiMOS. This requires python 3.7+ so we must either install miniconda to provide a newer python interpreter or we can create an alias to another python version already installed on the system via alias python=python3.8

3) CCI Firewall revisited However, once we are able to actually set up the python http server to intercept the http requests, this will fail if you have set up the CCI Proxy (https://secure.cci.rpi.edu/wiki/landingpads/Proxy/)! This is because -- ironically -- http://localhost (where the python server we started is found) isn't whitelisted by the proxy. So if you have the proxy set up (via setting environment variables http_proxy and https_proxy) then we'll have to selectively disable this at a specific point in the installation.

Installation Steps¶

1) On your local machine (not CCI). Clone the LAMMPS github repository and run the scripts that download all of the necessary dependencies. This will take 500MB of space and store all of the source tarballs in ~/.cache/lammps by default (remember to delete this when you're all done!). We'll then tarball up this directory and SCP it over to CCI. This will take a bit of time, unfortunately, depending on your personal internet connection.

a) Clone LAMMPS and create a cache of dependency tarballs

git clone https://github.com/lammps/lammps.git
cd lammps
./tools/offline/init_caches.sh
cd ~
cd .cache
tar -czf lammps_cache.tar.gz lammps

b) Send the tarball of all of the dependency cache to CCI

scp ./lammps_cache.tar.gz YOUR_CCI_USERNAME@blp01.ccni.rpi.edu:~/.

2) Install Miniconda: On a CCI front end node (not landing pad). AiMOS (DCS) is used as example, it's system architecture is ppc64le and will need a specific format of miniconda. Set up and install anaconda/miniconda according to https://secure.cci.rpi.edu/wiki/software/Conda/. (Download, execute installation script, agree to license terms, say 'yes' when asked to run conda init, then activate the conda virtual environment)

cd ~
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-ppc64le.sh
bash Miniconda3-latest-Linux-ppc64le.sh -p ~/barn/miniconda3
#aggree to terms, say yes when it asks if you want to run conda init
source ~/.bashrc

Alternatively, you could use a different, system-installed, python version via alias python=python3.8 instead.

3) Create LAMMPS Dependency Cache: On CCI systems (preferably on the front-end node of whatever system you're using: e.g. dcsfen01), we're going to unpack the tarball we've sent to our home directory.

a) Create .cache directory if it doesn't exist in your home directory and extract the lammps dependency tarball there.

cd ~
mkdir .cache
cd .cache
tar -xzf ~/lammps_cache.tar.gz

4) Clone LAMMPS and set up the external dependency redirect:

a) Clone LAMMPS (wherever you prefer, barn is used as example)

cd ~/barn
git clone https://github.com/lammps/lammps.git

b) Set up the dependency redirect

cd lammps
source tools/offline/use_caches.sh

What this does is it essentailly sets up your current shell environment to look at a locally hosted python server that is setup by use_caches.sh which points to ~/.cache/lammps for the location of the requested dependencies instead of the internet. Note: you may find an error where the default port 8080 is already in use. This likely means that someone else currently has that port bound for another task on the front end node that you're using. To rectify this, try another port in the similar range by using: export HTTP_CACHE_PORT=8855 for example.

5) Disable the system proxy when accessing the locally hosted python server:

export no_proxy=localhost,127.0.0.1

6) Install LAMMPS a) Load Modules:

module load cmake
module load spectrum-mpi
module load cuda
module load gcc

Also we need to tell spectrum-mpi that we wish to use GCC as our C/C++ compiler, otherwise it will default to trying to use XL.

export OMPI_CC=gcc OMPI_CXX=g++

b) Configure CMAKE. Make sure to prepend -D LAMMPS_DOWNLOADS_URL=${HTTP_CACHE_URL} -C "${LAMMPS_HTTP_CACHE_CONFIG} to your cmake configuration command so LAMMPS knows where to fetch the dependencies. Also supply it with whatever dependency preset that you want: most.cmake used as example.

cd <your lammps directory>
mkdir build
cd build
cmake ../cmake -D LAMMPS_DOWNLOADS_URL=${HTTP_CACHE_URL} -C "${LAMMPS_HTTP_CACHE_CONFIG}" -D CMAKE_BUILD_TYPE=Release -D BUILD_MPI=yes -D BUILD_OMP=no -D LAMMPS_MACHINE=AiMOS -D CMAKE_CXX_FLAGS="-O3 -std=c++11" -D PKG_GPU=yes -D GPU_API=cuda -D GPU_PREC=mixed -D GPU_ARCH=sm_70 <YOUR CMAKE OPTIONS> -C ../cmake/presets/most.cmake

c) Build LAMMPS

make -j 8

or

cmake --build .

7) Clean Up: Once everything has completed, remember to delete your .cache/lammps and LAMMPS clone directory on your local machine, keep the .cache/lammps directory on CCI system in case you need to re-install LAMMPS. Deactivate the caches and close the python server we set up on your CCI instance.

Deactivate the python server that was set up

deactivate_caches

8) Test Run:

Request interactive slurm session:

salloc -N 1 --gres=gpu:6 -t 180

ssh into the reserved node and execute the following:

cd ~/scratch/lammps-dev/lammps
mpirun -np 40 ./build/lmp_AiMOS -sf gpu -v x 10 -v y 10 -v z 10 -v t 200 < ./bench/in.lj

Running on the DCS¶

You must have passwordless SSH keys setup for mpirun to work.

More background information is available in the Slurm and DCS Supercomputer articles.

Create a file named submit_run.sh with the following contents:

#!/bin/bash -x

if [ "x$SLURM_NPROCS" = "x" ]
then
  if [ "x$SLURM_NTASKS_PER_NODE" = "x" ]
  then
    SLURM_NTASKS_PER_NODE=1
  fi
  SLURM_NPROCS=`expr $SLURM_JOB_NUM_NODES \* $SLURM_NTASKS_PER_NODE`
else
  if [ "x$SLURM_NTASKS_PER_NODE" = "x" ]
  then
    SLURM_NTASKS_PER_NODE=`expr $SLURM_NPROCS / $SLURM_JOB_NUM_NODES`
  fi
fi

srun hostname -s | sort -u > /tmp/hosts.$SLURM_JOB_ID
grep -q 'release 7\.' /etc/redhat-release
if [ $? -eq 0 ]; then
  net_suffix=-ib
fi
awk "{ print \$0 \"$net_suffix slots=$SLURM_NTASKS_PER_NODE\"; }" /tmp/hosts.$SLURM_JOB_ID >/tmp/tmp.$SLURM_JOB_ID
mv /tmp/tmp.$SLURM_JOB_ID /tmp/hosts.$SLURM_JOB_ID

cat /tmp/hosts.$SLURM_JOB_ID

module load xl_r
module load spectrum-mpi
module load cuda

##comment out the module load xl_r and uncoment the lower two lines if using GCC instead of XL_r
#module load gcc
#export OMPI_CC=gcc OMPI_CXX=g++

case=YOUR/PATH/TO/lammps/examples/flow/in.flow.pois
exe=YOUR/PATH/TO/lmp_AiMOS

mpirun -hostfile /tmp/hosts.$SLURM_JOB_ID -np $SLURM_NPROCS $exe -sf gpu -pk gpu $SLURM_NTASKS_PER_NODE -in $case > $SLURM_JOB_ID.out

rm /tmp/hosts.$SLURM_JOB_ID

Submit the job using sbatch, ex: sbatch --gres=gpu:6 -n 8 -N 2 -t numberOfMinutes ./submit_run.sh

In this example, 2 nodes will be used to run a total of 8 processes. Therefore, 4 processes will run on each node, each accessing its own GPU. Generally, each processes should have its own GPU for MPI applications. This example assumes that you have compiled with the xl_r compiler, if you compiled with GCC instead, remove the module load xl_r line from your sbatch script.

Building without CMAKE¶

LAMMPS can technically be built without CMAKE however there appear to be issues with this pathway (As of August 5, 2021) and is not recommended. This procedure is documented below:

Load modules

 module load xl_r
 module load spectrum-mpi
 module load cuda/10.1

Build LAMMPS GPU library

 cd src
 make lib-gpu args=" -a sm_70 -p double -b -c /usr/local/cuda-10.1"

Edit makefile

 cp MAKE/Makefile.mpi MAKE/Makefile.dcs

 #Add these lines to MAKE/Makefile.dcs 
 CUDA_HOME = /usr/local/cuda-10.1
 gpu_SYSINC =
 gpu_SYSLIB =  -lcudart -lcuda
 gpu_SYSPATH = -L$(CUDA_HOME)/lib64

Build LAMMPS

 make yes-asphere
 make yes-kspace
 make yes-gpu
 make dcs

Test

./lmp_dcs -sf gpu -pk gpu 2 -in ../examples/flow/in.flow.pois

Output should list accelerators used

 - Using acceleration for lj/cut:
 -  with 1 proc(s) per device.
 --------------------------------------------------------------------------
 Device 0: Tesla V100-SXM2-16GB, 80 CUs, 15/16 GB, 1.5 GHZ