nvidia

Memory allocation Nvidia vs AMD

淺唱寂寞╮ 提交于 2020-01-13 20:36:10
问题 I know there is a 128MB limit for a single block of GPU memory on AMD GPU's. Is there a similar limit on Nvidia GPU's? 回答1: On GTX 560 clGetDeviceInfo returns 256MiB for CL_DEVICE_MAX_MEM_ALLOC_SIZE, however I can allocate slightly less than 1GiB. See this thread discussing the issue. On AMD however this limit is enforced. You can raise it by changing GPU_MAX_HEAP_SIZE and GPU_MAX_ALLOC_SIZE environment variables (see this thread). 回答2: You can query this information at runtime using

NVIDIA Optimus card not switching under OpenGL

老子叫甜甜 提交于 2020-01-13 18:09:12
问题 When I used use "glGetString(GL_VERSION)" and "glGetString(GL_SHADING_LANGUAGE_VERSION)" to check the OpenGL version on my computer, I got the following information: 3.1.0 - Build 8.15.10.2538 for GL_VERSION 1.40 - Intel Build 8.15.10.2538 for GL_SHADING_LANGUAGE_VERSION When I ran "Geeks3D GPU Caps Viewer", it shown the OpenGL version of my graphics cards(NVS 4200M) are GL_VERSION: 4.3.0 GLSL version: 4.30 NVIDIA via Cg compiler Does that mean my graphics cards only supports some OpenGL 4.3

What is the difference between cudaMemcpy() and cudaMemcpyPeer() for P2P-copy?

ε祈祈猫儿з 提交于 2020-01-13 04:55:06
问题 I want to copy data from GPU0-DDR to GPU1-DDR directly without CPU-RAM. As said here on the page-15: http://people.maths.ox.ac.uk/gilesm/cuda/MultiGPU_Programming.pdf Peer-to-Peer Memcpy  Direct copy from pointer on GPU A to pointer on GPU B  With UVA, just use cudaMemcpy(…, cudaMemcpyDefault)  Or cudaMemcpyAsync(…, cudaMemcpyDefault)  Also non-UVA explicit P2P copies:  cudaError_t cudaMemcpyPeer( void * dst, int dstDevice, const void* src, int srcDevice, size_t count )  cudaError_t

Unable to execute device kernel in CUDA

时光总嘲笑我的痴心妄想 提交于 2020-01-11 13:59:13
问题 I am trying to call a device kernel within a global kernel. My global kernel is a Matrix Multiplication and my device kernel is finding the maximum value and the index in each column of the product matrix. Following is the code : __device__ void MaxFunction(float* Pd, float* max) { int x = (threadIdx.x + blockIdx.x * blockDim.x); int y = (threadIdx.y + blockIdx.y * blockDim.y); int k = 0; int temp = 0; int temp_idx = 0; for (k = 0; k < wB; ++k) { if(Pd[x*wB + y] > temp){ temp = Pd[x*wB + y];

How do the nVIDIA drivers assign device indices to GPUs?

我只是一个虾纸丫 提交于 2020-01-11 10:44:09
问题 Assume on a single node, there are several devcies with different compute capabilities, how nvidia rank them (by rank I mean the number assigned by cudaSetDevice)? Are there any general guideline about this? thanks. 回答1: I believe the ordering of devices corresponding to cudaGetDevice and cudaSetDevice (i.e. the CUDA runtime enumeration order should be either based on a heuristic that determines the fastest device and makes it first or else based on PCI enumeration order. You can confirm this

Multi-GPU profiling (Several CPUs , MPI/CUDA Hybrid)

你离开我真会死。 提交于 2020-01-10 19:58:10
问题 I had a quick look on the forums and I don't think this question has been asked already. I am currently working with an MPI/CUDA hybrid code, made by somebody else during his PhD. Each CPU has its own GPU. My task is to gather data by running the (already working) code, and implement extra things. Turning this code into a single CPU / Multi-GPU one is not an option at the moment (later, possibly.). I would like to make use of performance profiling tools to analyse the whole thing. For now an

Execute automatic change connected displays in Windows 8

本秂侑毒 提交于 2020-01-10 19:33:10
问题 Short version How do I automate changing multiple display settings? NVIDIA, 3x monitors (2x DVI and 1x HDMI), GPU only supports 2 active monitors. Long version So I have a NVIDIA GeForce GTX 560 Ti which can run two displays simultaneously. It has two DVI connections and one HDMI . I often swap from using my two desktop monitors and connect only one of the desktop monitors plus my TV using HDMI . I would like to automate the change back and forward using a batch script or other program

nvidia-smi Volatile GPU-Utilization explanation?

时光总嘲笑我的痴心妄想 提交于 2020-01-09 03:03:48
问题 I know that nvidia-smi -l 1 will give the GPU usage every one second (similarly to the following). However, I would appreciate an explanation on what Volatile GPU-Util really means. Is that the number of used SMs over total SMs, or the occupancy, or something else? +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.48 Driver Version: 367.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M|

OpenGL: glVertexAttribPointer() fails with “Invalid value” for stride bigger than 2048 on new NVIDIA drivers

纵饮孤独 提交于 2020-01-07 05:45:08
问题 Did anyone else recognize that it is not possible any more to call glVertexAttribPointer() with a stride bigger then 2048 with new NVIDIA drivers (from 331.58 WHQL and above)? The call creates the OpenGL error Invalid value (1281) . For example the following minimal GLUT example will generate the OpenGL error 1281 after testStride(2049); is called when using driver 331.58 WHQL: #include <iostream> #include <GL/glut.h> #include <windows.h> using namespace std; PFNGLVERTEXATTRIBPOINTERPROC

Kivy OpenGL requirements feasible for deployment?

纵饮孤独 提交于 2020-01-05 08:24:29
问题 I'm currently in the process of finding a nice GUI framework for my new project - and Kivy looks quite good. There are many questions here (like this one) about Kivy requiring OpenGL >2.0 (not accepting 1.4) and problems arising from that. As I understood, it's the graphics drivers thing to provide a decent OpenGL version. I'm concerned what problems I'll have deploying my app to users having a certain configuration, that they will not be willing or able to have OpenGL >2.0 on their desktop.