问题
I have a program which I'm compiling like this:
(...) Some ifort *.f -c
nvcc -c src/bicgstab.cu -o bicgstab.o -I/home/ricardo/apps/cusp/cusplibrary
(...) Some more *.for -c
ifort *.o -L/usr/local/cuda-5.5/lib64 -lcudart -lcublas -lcusparse -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -o program
Everything worked fine until i added the CUSP support where i have this wrapper (bicgstab.cu):
#include <cusp/csr_matrix.h>
#include <cusp/krylov/bicgstab.h>
#if defined(__cplusplus)
extern "C" {
#endif
void bicgstab_(int * device_I, int * device_J, float * device_V, float * device_x, float * device_b, int N, int NNZ){
// *NOTE* raw pointers must be wrapped with thrust::device_ptr!
thrust::device_ptr<int> wrapped_device_I(device_I);
thrust::device_ptr<int> wrapped_device_J(device_J);
thrust::device_ptr<float> wrapped_device_V(device_V);
thrust::device_ptr<float> wrapped_device_x(device_x);
thrust::device_ptr<float> wrapped_device_b(device_b);
// use array1d_view to wrap the individual arrays
typedef typename cusp::array1d_view< thrust::device_ptr<int> > DeviceIndexArrayView;
typedef typename cusp::array1d_view< thrust::device_ptr<float> > DeviceValueArrayView;
DeviceIndexArrayView row_indices (wrapped_device_I, wrapped_device_I + (N+1));
DeviceIndexArrayView column_indices(wrapped_device_J, wrapped_device_J + NNZ);
DeviceValueArrayView values (wrapped_device_V, wrapped_device_V + NNZ);
DeviceValueArrayView x (wrapped_device_x, wrapped_device_x + N);
DeviceValueArrayView b (wrapped_device_b, wrapped_device_b + N);
// combine the three array1d_views into a csr_matrix_view
typedef cusp::csr_matrix_view<DeviceIndexArrayView,
DeviceIndexArrayView,
DeviceValueArrayView> DeviceView;
// construct a csr_matrix_view from the array1d_views
DeviceView A(N, N, NNZ, row_indices, column_indices, values);
// set stopping criteria:
// iteration_limit = 100
// relative_tolerance = 1e-5
cusp::verbose_monitor<float> monitor(b, 100, 1e-5);
// solve the linear system A * x = b with the Conjugate Gradient method
cusp::krylov::bicgstab(A, x, b, monitor);
}
#if defined(__cplusplus)
}
#endif
Nvcc compiles and generate the object, but in the last command when i'm linking all together a bunch of errors because of the linking appears:
ipo: warning #11021: unresolved __gxx_personality_v0
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZTVSt9exception
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZTVSt9bad_alloc
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZdlPv
Referenced in bicgstab.o
ipo: warning #11021: unresolved __cxa_guard_acquire
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSaIcEC1Ev
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSsC1EPKcRKSaIcE
Referenced in bicgstab.o
ipo: warning #11021: unresolved __cxa_guard_release
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSsD1Ev
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSaIcED1Ev
Referenced in bicgstab.o
ipo: warning #11021: unresolved __cxa_guard_abort
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSsC1ERKSs
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSt13runtime_errorD2Ev
Referenced in bicgstab.o
ipo: warning #11021: unresolved __cxa_call_unexpected
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSt13runtime_errorC2ERKSs
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSsC1Ev
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNKSs5emptyEv
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNKSt13runtime_error4whatEv
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSsaSEPKc
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSspLEPKc
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSspLERKSs
Referenced in bicgstab.o
ipo: warning #11021: unresolved __cxa_begin_catch
Referenced in bicgstab.o
ipo: warning #11021: unresolved __cxa_end_catch
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNKSs5c_strEv
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNKSt9bad_alloc4whatEv
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSt9bad_allocD2Ev
Referenced in bicgstab.o
ipo: warning #11021: unresolved __cxa_allocate_exception
Referenced in bicgstab.o
ipo: warning #11021: unresolved __cxa_free_exception
Referenced in bicgstab.o
ipo: warning #11021: unresolved __cxa_throw
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSt9exceptionD2Ev
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZSt4cout
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSolsEf
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSolsEm
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSolsEPFRSoS_E
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZSt9terminatev
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZStlsIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_St5_Setw
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSolsEPFRSt8ios_baseS0_E
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSt9bad_allocD1Ev
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZTISt9bad_alloc
Referenced in bicgstab.o
ipo: warning #11021: unresolved __cxa_pure_virtual
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZTVN10__cxxabiv120__si_class_type_infoE
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZTISt9exception
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZTISt13runtime_error
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZTVN10__cxxabiv117__class_type_infoE
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSt8ios_base4InitC1Ev
Referenced in bicgstab.o
ipo: warning #11021: unresolved _ZNSt8ios_base4InitD1Ev
Referenced in bicgstab.o
I believe that its because ifort is adding or removing underscores, adding lower/upper cases or anything else because the file is compiling write and if i generate the binary outside my program, just for testing, it works great.
Thank you very much in advance!
回答1:
ipo is fairly complicated when there are multiple files involved. It's actually rerunning the compiler on all modules at link time. I'm not an expert on this, but that sounds like something fairly difficult to wade through.
One possible option might be that you try to compile your cuda code into a shared library (.so) and link against that. It should prevent the intel compiler toolchain from trying to recompile and optimize against the code generated by nvcc/gcc. I think this is going to limit you to "single file optimizations". Don't know if that will significantly affect your performance or not.
Using my example here, I would modify the compile commands as follows:
$ nvcc -Xcompiler="-fPIC" -shared bicgstab.cu -o bicgstab.so -I/home-2/robertc/misc/cusp/cusplibrary-master
$ ifort -c -fast bic.f90
$ ifort bic.o bicgstab.so -L/shared/apps/cuda/CUDA-v6.0.37/lib64 -lcudart -o program
ipo: remark #11001: performing single-file optimizations
ipo: remark #11006: generating object file /tmp/ipo_ifortxEdpin.o
$
You don't indicate where in your compile process you are adding the -fast
switch(es). If only on the ifort
compile commands, I believe the above approach will work. If you also want/need it on the link command, then it appears that ifort wants to build an entirely statically linked executable (and do intermodule optimization...), which won't work using the above process.
来源:https://stackoverflow.com/questions/24462580/unresolved-references-using-ifort-with-nvcc-and-cusp