nvidia | 易学教程

the Difference between running time and time of obtaining results in CUDA

阅读更多关于 the Difference between running time and time of obtaining results in CUDA

问题 I am trying to implement My algorithm on GPU using CUDA. this program work well but there is a problem. when I try to print out the results, they will be shown too late . here are some of my code. Assume True Results is not matter. __device__ unsigned char dev_state[128]; __device__ unsigned char GMul(unsigned char a, unsigned char b) { // Galois Field (256) Multiplication of two Bytes unsigned char p = 0; int counter; unsigned char hi_bit_set; for (counter = 0; counter < 8; counter++) { if (

the Difference between running time and time of obtaining results in CUDA

阅读更多关于 the Difference between running time and time of obtaining results in CUDA

OSX Sierra Tensorflow build error: ld: file not found: @rpath/CUDA.framework/Versions/A/CUDA

阅读更多关于 OSX Sierra Tensorflow build error: ld: file not found: @rpath/CUDA.framework/Versions/A/CUDA

问题 I have followed the instruction in: https://gist.github.com/notilas/a30e29ce514970e821a34153c1e78b3f But cannot complete it. OSX: Sierra Tensorflow version 1.1.0 (Google says v1.2 does not support OSX CUDA) CUDA Tool kit : 8.0 CUDNN : 6.0 Xcode : 7.2.1 Anaconda : 4.2 (Python version 3.5) Error Log: ERROR: /Users/so041e/ml/tensorflow/tensorflow/python/BUILD:2534:1: Linking of rule '//tensorflow/python:_pywrap_tensorflow_internal.so' failed: link_dynamic_library.sh failed: error executing

OSX Sierra Tensorflow build error: ld: file not found: @rpath/CUDA.framework/Versions/A/CUDA

阅读更多关于 OSX Sierra Tensorflow build error: ld: file not found: @rpath/CUDA.framework/Versions/A/CUDA

Access/synchronization to local memory

阅读更多关于 Access/synchronization to local memory

问题 I'm pretty new to GPGPU programming. I'm trying to implement algorithm that needs lot of synchronization, so its using only one work-group (global and local size have the same value) I have fallowing problem: my program is working correctly till size of problem exceeds 32. __kernel void assort( __global float *array, __local float *currentOutput, __local float *stimulations, __local int *noOfValuesAdded, __local float *addedValue, __local float *positionToInsert, __local int *activatedIdx, _

Access/synchronization to local memory

阅读更多关于 Access/synchronization to local memory

CUDA block synchronization differences between GTS 250 and Fermi devices

阅读更多关于 CUDA block synchronization differences between GTS 250 and Fermi devices

问题 So I've been working on program in which I'm creating a hash table in global memory. The code is completely functional (albeit slower) on a GTS250 which is a Compute 1.1 device. However, on a Compute 2.0 device (C2050 or C2070) the hash table is corrupt (data is incorrect and pointers are sometimes wrong). Basically the code works fine when only one block is utilized (both devices). However, when 2 or more blocks are used, it works only on the GTS250 and not on any Fermi devices. I understand

What can I do against 'CUDA driver version is insufficient for CUDA runtime version'?

阅读更多关于 What can I do against 'CUDA driver version is insufficient for CUDA runtime version'?

问题 When I go to /usr/local/cuda/samples/1_Utilities/deviceQuery and execute moose@pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make clean rm -f deviceQuery deviceQuery.o rm -rf ../../bin/x86_64/linux/release/deviceQuery moose@pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make "/usr/local/cuda-7.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch

AMD CPU versus Intel CPU openCL

阅读更多关于 AMD CPU versus Intel CPU openCL

问题 With some friends we want to use openCL. For this we look to buy a new computer, but we asked us the best between AMD and Intel for use of openCL. The graphics card will be a Nvidia and we don't have choice on the graphic card, so we start to want buy an intel cpu, but after some research we figure out that may be AMD cpu are better with openCL. We didn't find benchmarks which compare the both. So here is our questions: Is AMD better than Intel with openCL? Is it a matter to have a Nvidia

Where to download CUDA SDK from

阅读更多关于 Where to download CUDA SDK from

问题 I have been searching the nvidia website for the GPU Computing SDK as I am trying to build the pointclouds library (PCL) with cuda support. However, on the nvidia website all I can find are links for the toolkit and not a single download link for the SDK. I found this post: How can I download the latest version of the GPU computing SDK? However, that solution seems outdated and does not seem to work. 回答1: The link that fritzone gave (https://developer.nvidia.com/cuda-downloads) is an