问题
I'd like to convert Octave to use CuBLAS for matrix multiplication. This video seems to indicate this is as simple as typing 28 characters:
Using CUDA Library to Accelerate Applications
In practice it's a bit more complex than this. Does anyone know what additional work must be done to make the modifications made in this video compile?
UPDATE
Here's the method I'm trying
in dMatrix.cc add
#include <cublas.h>
in dMatrix.cc change all occurences of (preserving case)
dgemm
to
cublas_dgemm
in my build terminal set
export CC=nvcc
export CFLAGS="-lcublas -lcudart"
export CPPFLAGS="-I/usr/local/cuda/include"
export LDFLAGS="-L/usr/local/cuda/lib64"
the error I receive is:
libtool: link: g++ -I/usr/include/freetype2 -Wall -W -Wshadow -Wold-style-cast
-Wformat -Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual -g -O2
-o .libs/octave octave-main.o -L/usr/local/cuda/lib64
../libgui/.libs/liboctgui.so ../libinterp/.libs/liboctinterp.so
../liboctave/.libs/liboctave.so -lutil -lm -lpthread -Wl,-rpath
-Wl,/usr/local/lib/octave/3.7.5
../liboctave/.libs/liboctave.so: undefined reference to `cublas_dgemm_'
回答1:
EDIT2: The method described in this video requires the use of the fortran "thunking library" bindings for cublas. These steps worked for me:
Download octave 3.6.3 from here:
wget ftp://ftp.gnu.org/gnu/octave/octave-3.6.3.tar.gz
extract all files from the archive:
tar -xzvf octave-3.6.3.tar.gz
change into the octave directory just created:
cd octave-3.6.3
make a directory for your "thunking cublas library"
mkdir mycublas
change into that directory
cd mycublas
build the "thunking cublas library"
g++ -c -fPIC -I/usr/local/cuda/include -I/usr/local/cuda/src -DCUBLAS_GFORTRAN -o fortran_thunking.o /usr/local/cuda/src/fortran_thunking.c ar rvs libmycublas.a fortran_thunking.o
switch back to the main build directory
cd ..
run octave's
configure
with additional options:./configure --disable-docs LDFLAGS="-L/usr/local/cuda/lib64 -lcublas -lcudart -L/home/user2/octave/octave-3.6.3/mycublas -lmycublas"
Note that in the above command line, you will need to change the directory for the second
-L
switch to that which matches the path to yourmycublas
directory that you created in step 4Now edit
octave-3.6.3/liboctave/dMatrix.cc
according to the instructions given in the video. It should be sufficient to replace every instance ofdgemm
withcublas_dgemm
and every instance ofDGEMM
withCUBLAS_DGEMM
. In the octave 3.6.3 version I used, there were 3 such instances of each (lower case and upper case).Now you can build octave:
make
(make sure you are in the
octave-3.6.3
directory)
At this point, for me, Octave built successfully. I did not pursue make install
although I assume that would work. I simply ran octave using the ./run-octave
script in the octave-3.6.3
directory.
The above steps assume a proper and standard CUDA 5.0 install. I will try to respond to CUDA-specific questions or issues, but there are any number of problems that may arise with a general Octave install on your platform. I'm not an octave expert and I won't be able to respond to those. I used CentOS 6.2 for this test.
This method, as indicated, involves modification of the C source files of octave.
Another method was covered in some detail in the S3527 session at the GTC 2013 GPU Tech Conference. This session was actually a hands-on laboratory exercise. Unfortunately the materials on that are not conveniently available. However the method there did not involve any modification of GNU Octave source, but instead uses the LD_PRELOAD
capability of Linux to intercept the BLAS library calls and re-direct (the appropriate ones) to the cublas library.
A newer, better method (using the NVBLAS intercept library) is discussed in this blog article
回答2:
I was able to produce a compiled executable using the information supplied. It's a horrible hack, but it works.
The process looks like this:
First produce an object file for fortran_thunking.c
sudo /usr/local/cuda-5.0/bin/nvcc -O3 -c -DCUBLAS_GFORTRAN fortran_thunking.c
Then move that object file to the src
subdirectory in octave
cp /usr/local/cuda-5.0/src/fortran_thunking.o ./octave/src
run make
. The compile will fail on the last step. Change to the src
directory.
cd src
Then execute the failing final line with the addition of ./fortran_thunking.o -lcudart -lcublas
just after octave-main.o
. This produces the following command
g++ -I/usr/include/freetype2 -Wall -W -Wshadow -Wold-style-cast -Wformat
-Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual
-I/usr/local/cuda/include -o .libs/octave octave-main.o
./fortran_thunking.o -lcudart -lcublas -L/usr/local/cuda/lib64
../libgui/.libs/liboctgui.so ../libinterp/.libs/liboctinterp.so
../liboctave/.libs/liboctave.so -lutil -lm -lpthread -Wl,-rpath
-Wl,/usr/local/lib/octave/3.7.5
An octave
binary will be created in the src/.libs
directory. This is your octave executable.
回答3:
In a most recent version of CUDA you don't have to recompile anything. At least as I found in Debian. First, create a config file for NVBLAS (a cuBLAS wrapper). It won't work without it, at all.
tee nvblas.conf <<EOF
NVBLAS_CPU_BLAS_LIB $(dpkg -L libopenblas-base | grep libblas)
NVBLAS_GPU_LIST ALL
EOF
Then use Octave as you would usually do running it with:
LD_PRELOAD=libnvblas.so octave
NVBLAS will do what it can on a GPU while relaying everything else to OpenBLAS.
Further reading:
Benchmark for Octave.
Relevant slides for NVBLAS presentation.
Manual for nvblas.conf
Worth noting that you may not enjoy all the benefits of GPU computing depending on used CPU/GPU: OpenBLAS is quite fast with current multi-core processors. So fast that time spend copying data to GPU, working on it, and copying back could come close to time needed to do the job right on CPU. Check for yourself. Though GPUs are usually more energy efficient.
来源:https://stackoverflow.com/questions/17493270/converting-octave-to-use-cublas