Debugging¶
GDB¶
Instructions for running GDB on compute-node jobs are located in the Blue Gene/Q Application Development Redbook on page 105.
Debugging from startup¶
The following commands are for debugging a 2 process job from startup.
-
Create an interactive allocation salloc -p debug -n 2 -t 60
-
Load the gdb module module load gdb
-
Run the executable within gdbtool srun --runjob-opts="--start-tool
which gdbtool
" --ntasks-
The following should be output: Enter a rank to see its associated I/O node's IP address, or press enter to start the job:
-
Do not hit enter.
-
-
Open a new terminal on the BGQ front end node and connect gdb to the rank 0 process
-
Load the gdb module module load gdb
-
Attach GDB to the rank 0 process: pgdb 0
- Repeat step 4 for connecting gdb to the rank 1 process by passing
'1' to
pgdb
instead of '0'. - In the terminal where the 'srun' command was executed press enter to start running your application.
- The application can now be debugged from the two terminals running gdb.
-
There are two critical steps for using gdb on BGQ: running the gdb
server via gdbtool
, and attaching the gdb client via pgdb
. A
maximum of four ranks can be debugged.
Disconnect GDB¶
To disconnect GDB from a process enter the following at the gdb prompt:
disconnect
Debugging Hung Processes¶
Attaching GDB to a running process¶
If a job appears to be hanging you can use the following commands to attach gdb to the processes:
-
Load the gdb module module load experimental/gdb
-
Attach GDB to a MPI rank: pgdb -t
Note the same limits apply to running GDB this way, a maximum of four ranks can be debugged simultaneously.
Forcing core files to be written¶
The following assumes you only have one job currently running on AMOS. First run the following to get the id. Note, this is not the same id that Slurm uses.
list_jobs -l | awk '/ID/ {print $2}'
Run the following command to send the 'segfault' signal to the job.
kill_job -s 11 --id