nvidia | 易学教程

Transferring textures across adapters in DirectX 11

阅读更多关于 Transferring textures across adapters in DirectX 11

问题 I'm capturing the desktop with the Desktop Duplication API from one GPU and need to copy the texture (which is in GPU memory) to another GPU. To do this I have a capture thread that acquires the desktop image then copies it to a staging resource (created on the same device) using ID3D11DeviceContext::CopyResource. I then map that staging resource with Read, map the destination dynamic resource (which was created on the other device) with WriteDiscard and copy the data. On the rendering thread

Get temperature from NVidia GPU using NVAPI

阅读更多关于 Get temperature from NVidia GPU using NVAPI

问题 I have been trying for the last few days to get the temperature of my GPU using C++ using the NVAPI i have the following code #include "stdafx.h" #include "nvapi.h" int _tmain(int argc, _TCHAR* argv[]) { NvAPI_Status ret = NVAPI_OK; int i=0; NvDisplayHandle hDisplay_a[NVAPI_MAX_PHYSICAL_GPUS*2] = {0}; ret = NvAPI_Initialize(); if (!ret == NVAPI_OK){ NvAPI_ShortString string; NvAPI_GetErrorMessage(ret, string); printf("NVAPI NvAPI_Initialize: %s\n", string); } NvAPI_ShortString ver; NvAPI

Troubles with cudaMemcpyToSymbol

阅读更多关于 Troubles with cudaMemcpyToSymbol

问题 I'm trying to copy to constant memory. But I can not because of my misunderstanding of usage of cudaMemcpyToSymbol function. I'm trying to follow this Here is some code __device__ __constant__ double var1; __device__ __constant__ int var2; int main(){ //... some code here... double var1ToCopy = 10.1; int var2ToCopy = 1; void * p1 = &var1ToCopy; void * p2 = &var2ToCopy; cudaStatus = cudaMemcpyToSymbol((void*)&var1,p1,sizeof(double),0,cudaMemcpyHostToDevice); if (cudaStatus != cudaSuccess){

Is there really a timeout for kernels on nvidia gpus?

阅读更多关于 Is there really a timeout for kernels on nvidia gpus?

问题 searching for answers for why my kernels produce strange error messages or "0" only results I found this answer on SO that mentions that there is a timeout of 5s for kernels running on nvidia gpus? I googled for the timout but I could not find confirming sources or more information. What do you know about it? Could the timout cause strange behaviour for kernels with a long runtime? Thanks! 回答1: Further googling brought up this in the CUDA_Toolkit_Release_Notes_Linux.txt (Known Issus): #

How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?

阅读更多关于 How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?

问题 When I submit a SLURM job with the option --gres=gpu:1 to a node with two GPUs, how can I get the ID of the GPU which is allocated for the job? Is there an environment variable for this purpose? The GPUs I'm using are all nvidia GPUs. Thanks. 回答1: You can get the GPU id with the environment variable CUDA_VISIBLE_DEVICES . This variable is a comma separated list of the GPU ids assigned to the job. 回答2: Slurm stores this information in an environment variable, SLURM_JOB_GPUS . One way to keep

Determinant calculation with CUDA [closed]

阅读更多关于 Determinant calculation with CUDA [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . Is there any library or freely available code which will calculate the determinant of a small ( 6x6 ), double precision matrix entirely on a GPU? 回答1: Here is the plan, you will need to buffer 100s of these tiny matrices and launch the kernel once to compute the determinant for all of them at once. I am not

AMD vs NVIDIA. How do they differentiate in terms of support of OpenCL?

阅读更多关于 AMD vs NVIDIA. How do they differentiate in terms of support of OpenCL?

问题 I have an EC2 instance. It's specs are: g2.2xlarge Instance. Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz NVIDIA GRID GPU (Kepler GK104) with Ubuntu 14.04 - 64 bit. I have two questions: 1. After installing the CUDA toolkit on this system, I have the following output when using clinfo : clinfo: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so.1: no version information available (required by clinfo) Platform Version: OpenCL 1.2 CUDA 8.0.46 Platform Name: NVIDIA CUDA Platform Vendor:

CUDA Add Rows of a Matrix

阅读更多关于 CUDA Add Rows of a Matrix

问题 I'm trying to add the rows of a 4800x9600 matrix together, resulting in a matrix 1x9600. What I've done is split the 4800x9600 into 9,600 matrices of length 4800 each. I then perform a reduction on the 4800 elements. The trouble is, this is really slow... Anyone got any suggestions? Basically, I'm trying to implement MATLAB's sum(...) function. Here is the code which I've verified works fine, it's just it's really slow: void reduceRows(Matrix Dresult,Matrix DA) { //split DA into chunks Matrix

nvInitDll redirect success malforming PATH

阅读更多关于 nvInitDll redirect success malforming PATH

问题 I've run into a very strange issue and it seems like there's nobody else having the same problem (according to Google). When I start "cmd" from Win+R and echo %PATH% it's ok. But when I start cmd from another program like FreeCommander or Ant and echo %PATH% I get nvInitDll: App c:\dev\java1.6.0_22\bin\java.exe - redirect success. (java.exe replaced by the program which is echoing PATH). And of course PATH doesn't then work. I'm running Windows 7 64-bit. And it worked for some time but I can

A question about the details about the distribution from blocks to SMs in CUDA

阅读更多关于 A question about the details about the distribution from blocks to SMs in CUDA

问题 Let me take the hardware with computation ability 1.3 as an example. 30 SMs are available. Then at most 240 blocks are able to be running at the same time(Considering the limit of register and shared memory, the restriction to the number of block may be much lower). Those blocks beyond 240 have to wait for available hardware resources. My question is when those blocks beyond 240 will be assigned to SMs. Once some blocks of the first 240 are completed? Or when all of the first 240 blocks are