nvidia | 易学教程

What's the relation between nvidia driver, cuda driver and cuda toolkit?

阅读更多关于 What's the relation between nvidia driver, cuda driver and cuda toolkit?

问题 In the nvidia driver package, there is libcuda.so. Is the cuda driver the same as nvidia driver? And what's the relation between cuda toolkit and libcuda.so? 回答1: From cuda document http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#driver-api http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#versioning-and-compatibility It seems cuda driver is libcuda.so which is included in nvidia driver and used by cuda runtime api Nvidia driver includes driver kernel module and

How can I use 100% of VRAM on a secondary GPU from a single process on windows 10?

阅读更多关于 How can I use 100% of VRAM on a secondary GPU from a single process on windows 10?

问题 This is on windows 10 computer with no monitor attached to the Nvidia card. I've included output from nvida-smi showing > 5.04G was available. Here is the tensorflow code asking it to allocate just slightly more than I had seen previously: (I want this to be as close as possible to memory fraction=1.0) config = tf.ConfigProto() #config.gpu_options.allow_growth=True config.gpu_options.per_process_gpu_memory_fraction=0.84 config.log_device_placement=True sess = tf.Session(config=config) Just

Video decoder on Cuda ffmpeg

阅读更多关于 Video decoder on Cuda ffmpeg

问题 I starting to implement custum video decoder that utilize cuda HW decoder to generate YUV frame for next to encode it. How can I fill "CUVIDPICPARAMS" struc ??? Is it possible? My algorithm are: For get video stream packet I'm use ffmpeg-dev libs avcodec, avformat... My steps: 1) Open input file: avformat_open_input(&ff_formatContext,in_filename,nullptr,nullptr); 2) Get video stream property's: avformat_find_stream_info(ff_formatContext,nullptr); 3) Get video stream: ff_video_stream=ff

How could we generate random numbers in CUDA C with different seed on each run?

阅读更多关于 How could we generate random numbers in CUDA C with different seed on each run?

问题 I am working on a stochastic process and I wanted to generate different series if random numbers in CUDA kernel each time I run the program. This similar to what we does in C++ by declaring seed = time(null) followed by srand(seed) and rand( ) I can pass seeds from host to device via the kernel but the problem in doing this is I would have to pass an entire array of seeds into the kernel for each thread to have a different random seed each time. Is there a way I could generate random seed /

Are GPU Kepler CC3.0 processors not only pipelined architecture, but also superscalar? [closed]

阅读更多关于 Are GPU Kepler CC3.0 processors not only pipelined architecture, but also superscalar? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . In the documentation for CUDA 6.5 has written: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz3PIXMTktb 5.2.3. Multiprocessor Level ... 8L for devices of compute capability 3.x since a multiprocessor issues a pair of instructions per warp over one clock cycle for four warps at a time, as

Run Tensorflow with NVIDIA TensorRT Inference Engine

阅读更多关于 Run Tensorflow with NVIDIA TensorRT Inference Engine

问题 I would like to use NVIDIA TensorRT to run my Tensorflow models. Currenly, TensorRT supports Caffe prototxt network descriptor files. I was not able to find source code to convert Tensorflow models to Caffe models. Are there any workarounds? 回答1: TensorRT 3.0 supports import/conversion of TensorFlow graphs via it's UFF (universal framework format). Some layer implementations are missing and will require custom implementations via IPlugin interface. Previous versions didn't support native

How to fix low volatile GPU-Util with Tensorflow-GPU and Keras?

阅读更多关于 How to fix low volatile GPU-Util with Tensorflow-GPU and Keras?

问题 I have a 4 GPU machine on which I run Tensorflow (GPU) with Keras. Some of my classification problems take several hours to complete. nvidia-smi returns Volatile GPU-Util which never exceeds 25% on any of my 4 GPUs. How can I increase GPU Util% and speed up my training? 回答1: If your GPU util is below 80%, this is generally the sign of an input pipeline bottleneck. What this means is that the GPU sits idle much of the time, waiting for the CPU to prepare the data: What you want is the CPU to

NVidia drivers not running on AWS after restarting the AMI

阅读更多关于 NVidia drivers not running on AWS after restarting the AMI

问题 everybody, I have the following problem: I started a P2 instance with this AMI. I installed some tools like screen, torch, etc. Then I successfully run some experiments using GPU and I created an image of the instance, so that I can terminate it and run it again later. Later I started a new instance from the AMI I created before. Everything looked fine - screen, torch, my experiments were present on the system, but I couldn't run the same experiments as before: NVIDIA-SMI has failed because

NVidia drivers not running on AWS after restarting the AMI

阅读更多关于 NVidia drivers not running on AWS after restarting the AMI

why do we need cudaDeviceSynchronize(); in kernels with device-printf?

阅读更多关于 why do we need cudaDeviceSynchronize(); in kernels with device-printf?

问题 __global__ void helloCUDA(float f) { printf("Hello thread %d, f=%f\n", threadIdx.x, f); } int main() { helloCUDA<<<1, 5>>>(1.2345f); cudaDeviceSynchronize(); return 0; } Why is cudaDeviceSynchronize(); at many places for example here it is not required after kernel call? 回答1: A kernel launch is asynchronous . This means it returns control to the CPU thread immediately after starting up the GPU process, before the kernel has finished executing. So what is the next thing in the CPU thread here?