gpu | 易学教程

cuDF - Not leveraging GPU cores

阅读更多关于 cuDF - Not leveraging GPU cores

问题 I am the below piece of code in python with cuDF to speed up the process. But I do not see any difference in the speed when compared to my 4 core local machine cpu. GPU configuration is 4 x NVIDIA Tesla T4 def arima(train): h = [] for each in train: model = pm.auto_arima(np.array(ast.literal_eval(each))) p = model.predict(1).item(0) h.append(p) return h for t_df in pd.read_csv("testset.csv",chunksize=1000): t_df = cudf.DataFrame.from_pandas(t_df) t_df['predicted'] = arima(t_df['prev_sales'])

pytorch allocate memory for small size tensor on cpu and gpu but got error on a node with more than 400 GB

阅读更多关于 pytorch allocate memory for small size tensor on cpu and gpu but got error on a node with more than 400 GB

问题 I would like to build a torch.nn.embedding with tensors on databricks (the node is p2.8xlarge) by py3. My code: import numpy as np import torch from torch import nn num_embedding, num_dim = 14000, 300 embedding = nn.Embedding(num_embedding, num_dim) row, col = 800000, 302 t = [[x for x in range(col)] for _ in range(row)] t1 = torch.tensor(t) print(t1.shape) # torch.Size([800000, 302]) t1.dtype, t1.nelement() # torch.int64, 241600000 type(t1), t1.device, (t1.nelement() * t1.element_size())/

How does GPU parallelize different tasks?

阅读更多关于 How does GPU parallelize different tasks?

问题 I am really interested to understand how the GPU parallelizes different tasks such as real-time rendering and training the neural networks. I know the math behind parallelization but I am curious to know how GPU actually works. Real-time rendering and training neural networks are really different. How does GPU parallelize these two tasks efficiently? 回答1: GPU parallelization requires the problem to be split up in as many independent, equal computations as possible (SIMD). What in C++ looks

Pytorch RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

阅读更多关于 Pytorch RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

问题 I am training a model that takes tokenized strings which are then passed through an embedding layer and an LSTM thereafter. However, there seems to be an error in the input, as it does not pass through the embedding layer. class DrugModel(nn.Module): def __init__(self, input_dim, output_dim, hidden_dim, drug_embed_dim, lstm_layer, lstm_dropout, bi_lstm, linear_dropout, char_vocab_size, char_embed_dim, char_dropout, dist_fn, learning_rate, binary, is_mlp, weight_decay, is_graph, g_layer, g

Pytorch RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

阅读更多关于 Pytorch RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

pyopenCL, openCL, Can't build program on GPU

阅读更多关于 pyopenCL, openCL, Can't build program on GPU

问题 I have a piece of kernel source which runs on the G970 on my PC but won't compile on my early 2015 MacBook pro with Iris 6100 1536MB graphic. platform = cl.get_platforms()[0] device = platform.get_devices()[1] # Get the GPU ID ctx = cl.Context([device]) # Tell CL to use GPU queue = cl.CommandQueue(ctx) # Create a command queue for the target device. # program = cl.Program(ctx, kernelsource).build() print platform.get_devices() This get_devices() show I have 'Intel(R) Core(TM) i5-5287U CPU @ 2

Why is my CPU doing matrix operations faster than GPU instead?

阅读更多关于 Why is my CPU doing matrix operations faster than GPU instead?

问题 When I tried to verify that the GPU does matrix operations over the CPU, I got unexpected results.CPU performs better than GPU according to my experience result, it makes me confused. I used cpu and gpu to do matrix multiplication respectively.Programming environment is MXNet and cuda-10.1. with gpu: import mxnet as mx from mxnet import nd x = nd.random.normal(shape=(100000,100000),ctx=mx.gpu()) y = nd.random.normal(shape=(100000,100000),ctx=mx.gpu()) %timeit nd.dot(x,y) 50.8 µs ± 1.76 µs per

How to install Xgboost on macOS compiled with GPU Support?

阅读更多关于 How to install Xgboost on macOS compiled with GPU Support?

问题 I am trying to install xgboost integrated with GPU support, on my MacOS Mojave(10.14.6) from last 3 days, however, no success has been reached. I tried 2 approaches: pip install xgboost xgboost is installed here and it runs successfully without GPU option(i.e., without tree_method=’gpu_hist’). I want to run with gpu_hist by giving “tree_method=’gpu_hist’ ” in tree parameters. When I gave “tree_method=’gpu_hist’ ” in tree parameters, following error has come: XGBoostError: [12:10:34] /Users

Troubles with slow speeds in opencl

阅读更多关于 Troubles with slow speeds in opencl

问题 I am trying to use opencl for the first time, the goal is to calculate the argmin of each row in an array. Since the operation on each row is independent of the others, I thought this would be easy to put on the graphics card. I seem to get worse performance using this code than when i just run the code on the cpu with an outer forloop, any help would be appreciated. Here is the code: #pragma OPENCL EXTENSION cl_khr_fp64 : enable int argmin(global double *array, int end) { double minimum =

Access pixels in gpu::mat

阅读更多关于 Access pixels in gpu::mat

问题 I'd like to know how to access pixel information when using OpenCV GPU. I'm currently downloading gpu::mat information to a mat variable but it is too slow. Does anyone know how to do it? 回答1: You could access to the data inside a kernel. For (row,col) the row and column number, the value of a pixel with channel number ch < 3 will be: uint8_t val = gpumat.data[ (row*gpumat.step) + col*gpumat.channels() + ch]; So let say you have an BGR input image stored on a GpuMat called src and you want to