Multi-GPU profiling (Several CPUs , MPI/CUDA Hybrid)

前端未结

关注

 4  956

日久生厌

I had a quick look on the forums and I don\'t think this question has been asked already.

I am currently working with an MPI/CUDA hybrid code, made by somebody else

相关标签:

4条回答

日久生厌

2021-01-03 03:10

Apparently since 2015 it is possible to auto-annotated MPI calls via NVTX and mpi_interceptions.so library when using nvprof profiler:

https://devblogs.nvidia.com/gpu-pro-tip-track-mpi-calls-nvidia-visual-profiler/

http://on-demand.gputechconf.com/gtc/2017/presentation/s7495-jain-optimizing-application-performance-cuda-profiling-tools.pdf

TAO still does not support distributed deep learning according to this presentation:

http://on-demand.gputechconf.com/gtc/2017/presentation/s7684-allen-malony-performance-analysis-of-cuda-deep-learning-networks-using-tau.pdf

0 讨论(0)
发布评论:

提交评论
- 加载中...

無奈伤痛

2021-01-03 03:13

Take a look at nvprof, part of the CUDA 5.0 Toolkit (currently available as a release candidate). There are some limitations - it can only collect a limited number of counters in a given pass and it cannot collect metrics (so for now you'd have to script multiple launches if you want more than a few events). You can get more information from the nvvp built-in help, including an example MPI launch script (copied here but I suggest you check out the nvvp help for an up-to-date version if you have anything newer than the 5.0 RC).

#!/bin/sh
#
# Script to launch nvprof on an MPI process.  This script will
# create unique output file names based on the rank of the 
# process.  Examples:
#   mpirun -np 4 nvprof-script a.out 
#   mpirun -np 4 nvprof-script -o outfile a.out
#   mpirun -np 4 nvprof-script test/a.out -g -j
# In the case you want to pass a -o or -h flag to the a.out, you
# can do this.
#   mpirun -np 4 nvprof-script -c a.out -h -o
# You can also pass in arguments to nvprof
#   mpirun -np 4 nvprof-script --print-api-trace a.out
#

usage () {
 echo "nvprof-script [nvprof options] [-h] [-o outfile] a.out [a.out options]";
 echo "or"
 echo "nvprof-script [nvprof options] [-h] [-o outfile] -c a.out [a.out options]";
}

nvprof_args=""
while [ $# -gt 0 ];
do
    case "$1" in
        (-o) shift; outfile="$1";;
        (-c) shift; break;;
        (-h) usage; exit 1;;
        (*) nvprof_args="$nvprof_args $1";;
    esac
    shift
done

# If user did not provide output filename then create one
if [ -z $outfile ] ; then
    outfile=`basename $1`.nvprof-out
fi

# Find the rank of the process from the MPI rank environment variable
# to ensure unique output filenames.  The script handles Open MPI
# and MVAPICH.  If your implementation is different, you will need to
# make a change here.

# Open MPI
if [ ! -z ${OMPI_COMM_WORLD_RANK} ] ; then
    rank=${OMPI_COMM_WORLD_RANK}
fi
# MVAPICH
if [ ! -z ${MV2_COMM_WORLD_RANK} ] ; then
    rank=${MV2_COMM_WORLD_RANK}
fi

# Set the nvprof command and arguments.
NVPROF="nvprof --output-profile $outfile.$rank $nvprof_args" 
exec $NVPROF $*

# If you want to limit which ranks get profiled, do something like
# this. You have to use the -c switch to get the right behavior.
# mpirun -np 2 nvprof-script --print-api-trace -c a.out -q  
# if [ $rank -le 0 ]; then
#     exec $NVPROF $*
# else
#     exec $*
# fi

0 讨论(0)

旧时难觅i

2021-01-03 03:25
Things have changed since CUDA 5.0 and now we can simply use %h, %p and %q{ENV} as mentioned here instead of using a wrapper script:
```
$ mpirun -np 2 -host c0-0,c0-1 nvprof -o output.%h.%p.%q{OMPI_COMM_WORLD_RANK} ./my_mpi_app
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
小蘑菇

2021-01-03 03:27

Another option is since you are already using TAU to profile the CPU side of the application you could also use TAU to collect the GPU performance data. TAU supports multi-gpu execution along with MPI, take a look at http://www.nic.uoregon.edu/tau-wiki/Guide:TAUGPU for instructions on how to get started using TAU's GPU profiling capabilites. TAU uses CUPTI (CUda Performance Tools Interface) underneath and so the data you will be able to collect with TAU will be very similar to what to can collect with nVidia's Visual Profiler.

0 讨论(0)
发布评论:

提交评论
- 加载中...