multi-gpu

cudaMemGetInfo returns same amount of free memory on both devices of GTX 690

99封情书 提交于 2019-12-01 11:16:26
I have run into problems with Geforce GTX 690 while trying to track down the memory usage. A simple test program: BOOST_AUTO_TEST_CASE(cudaMemoryTest) { size_t mem_tot_0 = 0; size_t mem_free_0 = 0; size_t mem_tot_1 = 0; size_t mem_free_1 = 0; unsigned int mem_size = 100*1000000; float* h_P = new float[mem_size]; for(size_t i = 0; i < mem_size; i++) { h_P[i] = 0.f; } cudaSetDevice(0); cudaDeviceReset(); cudaMemGetInfo (&mem_free_0, & mem_tot_0); std::cout<<"Free memory before copy dev 0: "<<mem_free_0<<std::endl; cudaSetDevice(1); cudaDeviceReset(); cudaMemGetInfo (&mem_free_1, &mem_tot_1); std

cudaMemGetInfo returns same amount of free memory on both devices of GTX 690

你离开我真会死。 提交于 2019-12-01 08:06:04
问题 I have run into problems with Geforce GTX 690 while trying to track down the memory usage. A simple test program: BOOST_AUTO_TEST_CASE(cudaMemoryTest) { size_t mem_tot_0 = 0; size_t mem_free_0 = 0; size_t mem_tot_1 = 0; size_t mem_free_1 = 0; unsigned int mem_size = 100*1000000; float* h_P = new float[mem_size]; for(size_t i = 0; i < mem_size; i++) { h_P[i] = 0.f; } cudaSetDevice(0); cudaDeviceReset(); cudaMemGetInfo (&mem_free_0, & mem_tot_0); std::cout<<"Free memory before copy dev 0: "<

How can I override the CUDA kernel execution time limit on Windows with a secondary GPUs?

只愿长相守 提交于 2019-11-30 22:23:06
From Nvidia's website, it explain the time-out problem: Q: What is the maximum kernel execution time? On Windows, individual GPU program launches have a maximum run time of around 5 seconds. Exceeding this time limit usually will cause a launch failure reported through the CUDA driver or the CUDA runtime, but in some cases can hang the entire machine, requiring a hard reset. This is caused by the Windows "watchdog" timer that causes programs using the primary graphics adapter to time out if they run longer than the maximum allowed time. For this reason it is recommended that CUDA is run on a

How can I override the CUDA kernel execution time limit on Windows with a secondary GPUs?

梦想与她 提交于 2019-11-29 21:36:06
问题 From Nvidia's website, it explain the time-out problem: Q: What is the maximum kernel execution time? On Windows, individual GPU program launches have a maximum run time of around 5 seconds. Exceeding this time limit usually will cause a launch failure reported through the CUDA driver or the CUDA runtime, but in some cases can hang the entire machine, requiring a hard reset. This is caused by the Windows "watchdog" timer that causes programs using the primary graphics adapter to time out if

Do I have to use the MPS (MULTI-PROCESS SERVICE) when using CUDA6.5 + MPI?

孤者浪人 提交于 2019-11-29 16:27:54
By the link is written: https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf 1.1. AT A GLANCE 1.1.1. MPS The Multi-Process Service (MPS) is an alternative, binary-compatible implementation of the CUDA Application Programming Interface (API). The MPS runtime architecture is designed to transparently enable co-operative multi-process CUDA applications, typically MPI jobs , to utilize Hyper-Q capabilities on the latest NVIDIA (Kepler-based) Tesla and Quadro GPUs. Hyper-Q allows CUDA kernels to be processed concurrently on the same GPU; this can benefit performance when the

Python: How do we parallelize a python program to take advantage of a GPU server?

社会主义新天地 提交于 2019-11-29 14:43:38
In our lab, we have NVIDIA Tesla K80 GPU accelerator computing with the following characteristics: Intel(R) Xeon(R) CPU E5-2670 v3 @2.30GHz, 48 CPU processors, 128GB RAM, 12 CPU cores running under Linux 64-bit. I am running the following code which does GridSearchCV after vertically appends different sets of dataframes into a single series of a RandomForestRegressor model. The two sample datasets I am considering are found in this link import sys import imp import glob import os import pandas as pd import math from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature

CUDA fails when trying to use both onboard iGPU and Nvidia discrete card. How can i use both discrete nvidia and integrated (onboard) intel gpu? [closed]

醉酒当歌 提交于 2019-11-29 13:06:25
I had recently some trouble making my pc (ivybridge) use the onboard gpu (intel igpu HD4000) for normal screen display usage, while i run my CUDA programs for computations on the discrete Nvidia GT 640 i have on my machine. The problem was that under iGPU display, CUDA would be unable to spot the nvidia card , and the nvidia drivers would not load at all. Keep in mind that there are confirmed issues (mostly about concurrency) when using the nvidia windows drivers for display devices, and also want to use CUDA. Those issues can get overridden when you use the Intel gpu as display (thus loading

Is there a way to programmatically select the rendering GPU in a multi-GPU environment? (Windows)

我是研究僧i 提交于 2019-11-29 02:10:59
问题 Question I have an OpenGL application that will run in machines with diverse multi-GPU configurations (and possibly different Windows versions, from XP to 7). Is there a general way to select the specific GPU that will act as the OpenGL renderer independently of the GPU combination (e.g. NVIDIA + NVIDIA, NVIDIA + AMD, NVIDIA + Intel, etc.)? It has to be a solution that can be applied from application code, i.e. directly in C++ or a script that would be called from the application, with no end

Do I have to use the MPS (MULTI-PROCESS SERVICE) when using CUDA6.5 + MPI?

只愿长相守 提交于 2019-11-28 11:08:18
问题 By the link is written: https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf 1.1. AT A GLANCE 1.1.1. MPS The Multi-Process Service (MPS) is an alternative, binary-compatible implementation of the CUDA Application Programming Interface (API). The MPS runtime architecture is designed to transparently enable co-operative multi-process CUDA applications, typically MPI jobs , to utilize Hyper-Q capabilities on the latest NVIDIA (Kepler-based) Tesla and Quadro GPUs. Hyper-Q

Python: How do we parallelize a python program to take advantage of a GPU server?

倖福魔咒の 提交于 2019-11-28 08:16:47
问题 In our lab, we have NVIDIA Tesla K80 GPU accelerator computing with the following characteristics: Intel(R) Xeon(R) CPU E5-2670 v3 @2.30GHz, 48 CPU processors, 128GB RAM, 12 CPU cores running under Linux 64-bit. I am running the following code which does GridSearchCV after vertically appends different sets of dataframes into a single series of a RandomForestRegressor model. The two sample datasets I am considering are found in this link import sys import imp import glob import os import