pycuda

How to profile PyCuda code with the Visual Profiler?

China☆狼群 提交于 2019-12-28 03:04:06
问题 When I create a new session and tell the Visual Profiler to launch my python/pycuda scripts I get following error message: Execution run #1 of program '' failed, exit code: 255 These are my preferences: Launch: python "/pathtopycudafile/mysuperkernel.py" Working Directory: "/pathtopycudafile/mysuperkernel.py" Arguments: [empty] I use CUDA 4.0 under Ubuntu 10.10. 64Bit. Profiling compiled examples works. p.s. I am aware of SO question How to profile PyCuda code in Linux?, but seems to be an

PyCUDA precision of matrix multiplication code

北慕城南 提交于 2019-12-24 02:33:34
问题 I am trying to learn CUDA and using PyCUDA to write a simple matrix multiplication code. For two 4x4 randomly generated matrices I get the following solution: Cuda: [[ -5170.86181641 -21146.49609375 20690.02929688 -35413.9296875 ] [-18998.5 -3199.53271484 13364.62890625 7141.36816406] [ 31197.43164062 21095.02734375 1750.64453125 11304.63574219] [ -896.64978027 18424.33007812 -17135.00390625 7418.28417969]] Python: [[ -5170.86035156 -21146.49609375 20690.02929688 -35413.9296875 ] [-18998.5

How can i tell PyCUDA which GPU to use?

笑着哭i 提交于 2019-12-23 15:36:28
问题 I have two NVidia cards in my machine, and both are CUDA capable. When I run the example script to get started with PyCUDA seen here: http://documen.tician.de/pycuda/ i get the error nvcc fatal : Value 'sm_30' is not defined for option 'gpu-architecture' My computing GPU is compute capability 3.0, so sm_30 should be the right option for the nvcc compiler. My graphics GPU is only CC 1.2, so i thought maybe that's the problem. I've installed the CUDA 5.0 release for linux with no errors, and

How to profile PyCuda code in Linux?

末鹿安然 提交于 2019-12-22 12:38:19
问题 I have a simple (tested) pycuda app and am trying to profile it. I've tried NVidia's Compute Visual Profiler, which runs the program 11 times, then emits this error: NV_Warning: Ignoring the invalid profiler config option: fb0_subp0_read_sectors Error : Profiler data file '/home/jguy/proj/gpu/tdbp/pyArch/temp_compute_profiler_0_0.csv' does not contain profiler output.This can happen when: a) Profiling is disabled during the entire run of the application. b) The application does not invoke any

Python Multiprocessing with PyCUDA

烈酒焚心 提交于 2019-12-20 08:39:16
问题 I've got a problem that I want to split across multiple CUDA devices, but I suspect my current system architecture is holding me back; What I've set up is a GPU class, with functions that perform operations on the GPU (strange that). These operations are of the style for iteration in range(maxval): result[iteration]=gpuinstance.gpufunction(arguments,iteration) I'd imagined that there would be N gpuinstances for N devices, but I don't know enough about multiprocessing to see the simplest way

PyCuda Error in Execution

99封情书 提交于 2019-12-13 23:54:42
问题 This is my pycuda code for rotation.I have installed the latest cuda drivers and I use a nvidia gpu with cuda support.I have also installed the cuda toolkit and pycuda drivers.Still I get this strange error. import pycuda.driver as cuda import pycuda.compiler import pycuda.autoinit import numpy from math import pi,cos,sin _rotation_kernel_source = """ texture<float, 2> tex; __global__ void copy_texture_kernel( const float resize_val, const float alpha, unsigned short oldiw, unsigned short

PyCuda / Multiprocessing Issue on OS X 10.8

风格不统一 提交于 2019-12-13 15:33:19
问题 I'm working on a project where I distribute compute tasks to multiple python Processes each associated with its own CUDA device. When spawning the subprocesses, I use the following code: import pycuda.driver as cuda class ComputeServer(object): def _init_workers(self): self.workers = [] cuda.init() for device_id in range(cuda.Device.count()): print "initializing device {}".format(device_id) worker = CudaWorker(device_id) worker.start() self.workers.append(worker) The CudaWorker is defined in

PyCuda: Dereferencing Array Element Via Pointer in Cuda Kernel

你离开我真会死。 提交于 2019-12-13 08:18:33
问题 I am using PyCuda to pass pairs of arrays to a cuda kernel via a pointer. The arrays are the output of a different kernel, so the data is already on the GPU. Within the kernel, I'm trying to access elements in each of the arrays to do a vector subtraction. The values that I'm getting for the elements in the array are not correct (h & p are wrong in the code below). Can anyone help me see what am I doing wrong? My code: import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler

pycuda shared memory error “pycuda._driver.LogicError: cuLaunchKernel failed: invalid value”

僤鯓⒐⒋嵵緔 提交于 2019-12-13 04:42:21
问题 I have a strange problem which origin I cannot determine: I have a working Kernel for some special Matrix-Vector-multiplication, which I want to speed up. Basically the big matrix (10^6 times 10^6) is constructed from few small matrices. So I want to put that data in shared memory. However when I try to add the shared memory, I only get the error: pycuda._driver.LogicError: cuLaunchKernel failed: invalid value So my working kernel is: #define FIELD_SIZE {field} #define BLOCK_SIZE {block} _

“Peer access” failed when using pycuda and tensorflow together

て烟熏妆下的殇ゞ 提交于 2019-12-13 03:01:48
问题 I have some codes in python3 like this: import numpy as np import pycuda.driver as cuda from pycuda.compiler import SourceModule, compile import tensorflow as tf # create device and context cudadevice=cuda.Device(gpuid1) cudacontext=cudadevice.make_context() config = tf.ConfigProto() config.gpu_options.visible_device_list={}.format(gpuid2) sess = tf.Session(config=config) # compile from a .cu file cuda_mod = SourceModule(cudaCode, include_dirs = [dir_path], no_extern_c = True, options = ['-O0