nvidia

the Difference between running time and time of obtaining results in CUDA

淺唱寂寞╮ 提交于 2020-01-17 14:05:05
问题 I am trying to implement My algorithm on GPU using CUDA. this program work well but there is a problem. when I try to print out the results, they will be shown too late . here are some of my code. Assume True Results is not matter. __device__ unsigned char dev_state[128]; __device__ unsigned char GMul(unsigned char a, unsigned char b) { // Galois Field (256) Multiplication of two Bytes unsigned char p = 0; int counter; unsigned char hi_bit_set; for (counter = 0; counter < 8; counter++) { if (

the Difference between running time and time of obtaining results in CUDA

喜夏-厌秋 提交于 2020-01-17 14:03:52
问题 I am trying to implement My algorithm on GPU using CUDA. this program work well but there is a problem. when I try to print out the results, they will be shown too late . here are some of my code. Assume True Results is not matter. __device__ unsigned char dev_state[128]; __device__ unsigned char GMul(unsigned char a, unsigned char b) { // Galois Field (256) Multiplication of two Bytes unsigned char p = 0; int counter; unsigned char hi_bit_set; for (counter = 0; counter < 8; counter++) { if (

OSX Sierra Tensorflow build error: ld: file not found: @rpath/CUDA.framework/Versions/A/CUDA

社会主义新天地 提交于 2020-01-17 07:08:05
问题 I have followed the instruction in: https://gist.github.com/notilas/a30e29ce514970e821a34153c1e78b3f But cannot complete it. OSX: Sierra Tensorflow version 1.1.0 (Google says v1.2 does not support OSX CUDA) CUDA Tool kit : 8.0 CUDNN : 6.0 Xcode : 7.2.1 Anaconda : 4.2 (Python version 3.5) Error Log: ERROR: /Users/so041e/ml/tensorflow/tensorflow/python/BUILD:2534:1: Linking of rule '//tensorflow/python:_pywrap_tensorflow_internal.so' failed: link_dynamic_library.sh failed: error executing

OSX Sierra Tensorflow build error: ld: file not found: @rpath/CUDA.framework/Versions/A/CUDA

余生颓废 提交于 2020-01-17 07:07:45
问题 I have followed the instruction in: https://gist.github.com/notilas/a30e29ce514970e821a34153c1e78b3f But cannot complete it. OSX: Sierra Tensorflow version 1.1.0 (Google says v1.2 does not support OSX CUDA) CUDA Tool kit : 8.0 CUDNN : 6.0 Xcode : 7.2.1 Anaconda : 4.2 (Python version 3.5) Error Log: ERROR: /Users/so041e/ml/tensorflow/tensorflow/python/BUILD:2534:1: Linking of rule '//tensorflow/python:_pywrap_tensorflow_internal.so' failed: link_dynamic_library.sh failed: error executing

Access/synchronization to local memory

☆樱花仙子☆ 提交于 2020-01-17 06:22:29
问题 I'm pretty new to GPGPU programming. I'm trying to implement algorithm that needs lot of synchronization, so its using only one work-group (global and local size have the same value) I have fallowing problem: my program is working correctly till size of problem exceeds 32. __kernel void assort( __global float *array, __local float *currentOutput, __local float *stimulations, __local int *noOfValuesAdded, __local float *addedValue, __local float *positionToInsert, __local int *activatedIdx, _

Access/synchronization to local memory

狂风中的少年 提交于 2020-01-17 06:22:00
问题 I'm pretty new to GPGPU programming. I'm trying to implement algorithm that needs lot of synchronization, so its using only one work-group (global and local size have the same value) I have fallowing problem: my program is working correctly till size of problem exceeds 32. __kernel void assort( __global float *array, __local float *currentOutput, __local float *stimulations, __local int *noOfValuesAdded, __local float *addedValue, __local float *positionToInsert, __local int *activatedIdx, _

CUDA block synchronization differences between GTS 250 and Fermi devices

老子叫甜甜 提交于 2020-01-17 05:48:15
问题 So I've been working on program in which I'm creating a hash table in global memory. The code is completely functional (albeit slower) on a GTS250 which is a Compute 1.1 device. However, on a Compute 2.0 device (C2050 or C2070) the hash table is corrupt (data is incorrect and pointers are sometimes wrong). Basically the code works fine when only one block is utilized (both devices). However, when 2 or more blocks are used, it works only on the GTS250 and not on any Fermi devices. I understand

What can I do against 'CUDA driver version is insufficient for CUDA runtime version'?

∥☆過路亽.° 提交于 2020-01-16 12:02:18
问题 When I go to /usr/local/cuda/samples/1_Utilities/deviceQuery and execute moose@pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make clean rm -f deviceQuery deviceQuery.o rm -rf ../../bin/x86_64/linux/release/deviceQuery moose@pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make "/usr/local/cuda-7.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch

AMD CPU versus Intel CPU openCL

纵饮孤独 提交于 2020-01-14 19:14:41
问题 With some friends we want to use openCL. For this we look to buy a new computer, but we asked us the best between AMD and Intel for use of openCL. The graphics card will be a Nvidia and we don't have choice on the graphic card, so we start to want buy an intel cpu, but after some research we figure out that may be AMD cpu are better with openCL. We didn't find benchmarks which compare the both. So here is our questions: Is AMD better than Intel with openCL? Is it a matter to have a Nvidia

Where to download CUDA SDK from

好久不见. 提交于 2020-01-14 08:49:05
问题 I have been searching the nvidia website for the GPU Computing SDK as I am trying to build the pointclouds library (PCL) with cuda support. However, on the nvidia website all I can find are links for the toolkit and not a single download link for the SDK. I found this post: How can I download the latest version of the GPU computing SDK? However, that solution seems outdated and does not seem to work. 回答1: The link that fritzone gave (https://developer.nvidia.com/cuda-downloads) is an