Converting Octave to Use CuBLAS

问题

I'd like to convert Octave to use CuBLAS for matrix multiplication. This video seems to indicate this is as simple as typing 28 characters:

Using CUDA Library to Accelerate Applications

In practice it's a bit more complex than this. Does anyone know what additional work must be done to make the modifications made in this video compile?

UPDATE

Here's the method I'm trying

in dMatrix.cc add

#include <cublas.h>

in dMatrix.cc change all occurences of (preserving case)

dgemm

cublas_dgemm

in my build terminal set

export CC=nvcc
export CFLAGS="-lcublas -lcudart"
export CPPFLAGS="-I/usr/local/cuda/include"
export LDFLAGS="-L/usr/local/cuda/lib64"

the error I receive is:

libtool: link: g++ -I/usr/include/freetype2 -Wall -W -Wshadow -Wold-style-cast 
-Wformat -Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual -g -O2
-o .libs/octave octave-main.o  -L/usr/local/cuda/lib64 
../libgui/.libs/liboctgui.so ../libinterp/.libs/liboctinterp.so 
../liboctave/.libs/liboctave.so -lutil -lm -lpthread -Wl,-rpath
-Wl,/usr/local/lib/octave/3.7.5

../liboctave/.libs/liboctave.so: undefined reference to `cublas_dgemm_'

回答1:

EDIT2: The method described in this video requires the use of the fortran "thunking library" bindings for cublas. These steps worked for me:

Download octave 3.6.3 from here:

wget ftp://ftp.gnu.org/gnu/octave/octave-3.6.3.tar.gz

extract all files from the archive:
```
tar -xzvf octave-3.6.3.tar.gz
```
change into the octave directory just created:
```
cd octave-3.6.3
```
make a directory for your "thunking cublas library"
```
mkdir mycublas
```
change into that directory
```
cd mycublas
```

build the "thunking cublas library"

g++ -c -fPIC -I/usr/local/cuda/include -I/usr/local/cuda/src -DCUBLAS_GFORTRAN -o fortran_thunking.o /usr/local/cuda/src/fortran_thunking.c
ar rvs libmycublas.a fortran_thunking.o

switch back to the main build directory
```
cd ..
```
run octave's configure with additional options:
```
./configure --disable-docs LDFLAGS="-L/usr/local/cuda/lib64 -lcublas -lcudart -L/home/user2/octave/octave-3.6.3/mycublas -lmycublas"
```
Note that in the above command line, you will need to change the directory for the second -L switch to that which matches the path to your mycublas directory that you created in step 4
Now edit octave-3.6.3/liboctave/dMatrix.cc according to the instructions given in the video. It should be sufficient to replace every instance of dgemm with cublas_dgemm and every instance of DGEMM with CUBLAS_DGEMM. In the octave 3.6.3 version I used, there were 3 such instances of each (lower case and upper case).
Now you can build octave:
```
make
```
(make sure you are in the octave-3.6.3 directory)

At this point, for me, Octave built successfully. I did not pursue make install although I assume that would work. I simply ran octave using the ./run-octave script in the octave-3.6.3 directory.

The above steps assume a proper and standard CUDA 5.0 install. I will try to respond to CUDA-specific questions or issues, but there are any number of problems that may arise with a general Octave install on your platform. I'm not an octave expert and I won't be able to respond to those. I used CentOS 6.2 for this test.

This method, as indicated, involves modification of the C source files of octave.

Another method was covered in some detail in the S3527 session at the GTC 2013 GPU Tech Conference. This session was actually a hands-on laboratory exercise. Unfortunately the materials on that are not conveniently available. However the method there did not involve any modification of GNU Octave source, but instead uses the LD_PRELOAD capability of Linux to intercept the BLAS library calls and re-direct (the appropriate ones) to the cublas library.

A newer, better method (using the NVBLAS intercept library) is discussed in this blog article

回答2:

I was able to produce a compiled executable using the information supplied. It's a horrible hack, but it works.

The process looks like this:

First produce an object file for fortran_thunking.c

sudo /usr/local/cuda-5.0/bin/nvcc -O3 -c -DCUBLAS_GFORTRAN fortran_thunking.c

Then move that object file to the src subdirectory in octave

cp /usr/local/cuda-5.0/src/fortran_thunking.o ./octave/src

run make. The compile will fail on the last step. Change to the src directory.

cd src

Then execute the failing final line with the addition of ./fortran_thunking.o -lcudart -lcublas just after octave-main.o. This produces the following command

g++ -I/usr/include/freetype2 -Wall -W -Wshadow -Wold-style-cast -Wformat
 -Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual
 -I/usr/local/cuda/include -o .libs/octave octave-main.o 
./fortran_thunking.o -lcudart -lcublas  -L/usr/local/cuda/lib64 
../libgui/.libs/liboctgui.so ../libinterp/.libs/liboctinterp.so 
../liboctave/.libs/liboctave.so -lutil -lm -lpthread -Wl,-rpath 
-Wl,/usr/local/lib/octave/3.7.5

An octave binary will be created in the src/.libs directory. This is your octave executable.

回答3:

In a most recent version of CUDA you don't have to recompile anything. At least as I found in Debian. First, create a config file for NVBLAS (a cuBLAS wrapper). It won't work without it, at all.

tee nvblas.conf <<EOF
NVBLAS_CPU_BLAS_LIB $(dpkg -L libopenblas-base | grep libblas)
NVBLAS_GPU_LIST ALL
EOF

Then use Octave as you would usually do running it with:

LD_PRELOAD=libnvblas.so octave

NVBLAS will do what it can on a GPU while relaying everything else to OpenBLAS.