Skip to content

Debugging

GDB

Instructions for running GDB on compute-node jobs are located in the Blue Gene/Q Application Development Redbook on page 105.

Debugging from startup

The following commands are for debugging a 2 process job from startup.

  1. Create an interactive allocation salloc -p debug -n 2 -t 60

  2. Load the gdb module module load gdb

  3. Run the executable within gdbtool srun --runjob-opts="--start-tool which gdbtool " --ntasks

    • The following should be output: Enter a rank to see its associated I/O node's IP address, or press enter to start the job:

    • Do not hit enter.

  4. Open a new terminal on the BGQ front end node and connect gdb to the rank 0 process

    • Load the gdb module module load gdb

    • Attach GDB to the rank 0 process: pgdb 0

    • Repeat step 4 for connecting gdb to the rank 1 process by passing '1' to pgdb instead of '0'.
    • In the terminal where the 'srun' command was executed press enter to start running your application.
    • The application can now be debugged from the two terminals running gdb.

There are two critical steps for using gdb on BGQ: running the gdb server via gdbtool, and attaching the gdb client via pgdb. A maximum of four ranks can be debugged.

Disconnect GDB

To disconnect GDB from a process enter the following at the gdb prompt:

disconnect

Debugging Hung Processes

Attaching GDB to a running process

If a job appears to be hanging you can use the following commands to attach gdb to the processes:

  1. Load the gdb module module load experimental/gdb

  2. Attach GDB to a MPI rank: pgdb -t

Note the same limits apply to running GDB this way, a maximum of four ranks can be debugged simultaneously.

Forcing core files to be written

The following assumes you only have one job currently running on AMOS. First run the following to get the id. Note, this is not the same id that Slurm uses.

list_jobs -l | awk '/ID/ {print $2}'

Run the following command to send the 'segfault' signal to the job.

kill_job -s 11 --id

**OR** Using Slurm ` scancel --signal=ABRT slurmJobId` Core Files ---------- Add the following command to the srun command to enable the output of core files. ` --runjob-opts="--envs BG_COREDUMPDISABLED=0 BG_COREDUMPONEXIT=1 "` Core files can be viewed with the coreprocessor tool. ` /bgsys/drivers/ppcfloor/coreprocessor/bin/coreprocessor.pl -b=`` -c=/path/to/directory/with/core/files` More info on the coreprocessor tool is in the [Administration Redbook](http://www.redbooks.ibm.com/redbooks/pdfs/sg247869.pdf) ### Text based core processing tool Create a file named 'getStack.sh' with the following contents: #!/bin/bash -e expectedArgs=2 if [ $# -ne $expectedArgs ]; then echo "Usage: $0 " exit 0 fi corefile=$1 stackfile=stack${corefile##core} exe=$2 echo input: $corefile echo output: $stackfile grep -n STACK $corefile | awk -F : '{print $1}' > lines let s=`head -n 1 lines`+2 let f=`tail -n -1 lines`-1 sed -n ${s},${f}p $corefile | awk '{print $2}' | perl -pi -e 's/000000000/0x/g' > core.addy addr2line -e $exe < core.addy > $stackfile rm lines rm core.addy Make it executable: chmod +x getStack.sh To generate a file with the stack trace from a given core file run: ./getStack.sh core.##### /path/to/executable/that/generated/core/files/foo.exe