gpu

cuDF - Not leveraging GPU cores

六眼飞鱼酱① 提交于 2021-02-11 15:16:55
问题 I am the below piece of code in python with cuDF to speed up the process. But I do not see any difference in the speed when compared to my 4 core local machine cpu. GPU configuration is 4 x NVIDIA Tesla T4 def arima(train): h = [] for each in train: model = pm.auto_arima(np.array(ast.literal_eval(each))) p = model.predict(1).item(0) h.append(p) return h for t_df in pd.read_csv("testset.csv",chunksize=1000): t_df = cudf.DataFrame.from_pandas(t_df) t_df['predicted'] = arima(t_df['prev_sales'])

pytorch allocate memory for small size tensor on cpu and gpu but got error on a node with more than 400 GB

南笙酒味 提交于 2021-02-11 14:57:38
问题 I would like to build a torch.nn.embedding with tensors on databricks (the node is p2.8xlarge) by py3. My code: import numpy as np import torch from torch import nn num_embedding, num_dim = 14000, 300 embedding = nn.Embedding(num_embedding, num_dim) row, col = 800000, 302 t = [[x for x in range(col)] for _ in range(row)] t1 = torch.tensor(t) print(t1.shape) # torch.Size([800000, 302]) t1.dtype, t1.nelement() # torch.int64, 241600000 type(t1), t1.device, (t1.nelement() * t1.element_size())/

How does GPU parallelize different tasks?

有些话、适合烂在心里 提交于 2021-02-11 14:49:23
问题 I am really interested to understand how the GPU parallelizes different tasks such as real-time rendering and training the neural networks. I know the math behind parallelization but I am curious to know how GPU actually works. Real-time rendering and training neural networks are really different. How does GPU parallelize these two tasks efficiently? 回答1: GPU parallelization requires the problem to be split up in as many independent, equal computations as possible (SIMD). What in C++ looks

Pytorch RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

帅比萌擦擦* 提交于 2021-02-11 14:37:48
问题 I am training a model that takes tokenized strings which are then passed through an embedding layer and an LSTM thereafter. However, there seems to be an error in the input, as it does not pass through the embedding layer. class DrugModel(nn.Module): def __init__(self, input_dim, output_dim, hidden_dim, drug_embed_dim, lstm_layer, lstm_dropout, bi_lstm, linear_dropout, char_vocab_size, char_embed_dim, char_dropout, dist_fn, learning_rate, binary, is_mlp, weight_decay, is_graph, g_layer, g

Pytorch RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

折月煮酒 提交于 2021-02-11 14:35:05
问题 I am training a model that takes tokenized strings which are then passed through an embedding layer and an LSTM thereafter. However, there seems to be an error in the input, as it does not pass through the embedding layer. class DrugModel(nn.Module): def __init__(self, input_dim, output_dim, hidden_dim, drug_embed_dim, lstm_layer, lstm_dropout, bi_lstm, linear_dropout, char_vocab_size, char_embed_dim, char_dropout, dist_fn, learning_rate, binary, is_mlp, weight_decay, is_graph, g_layer, g

pyopenCL, openCL, Can't build program on GPU

喜你入骨 提交于 2021-02-10 20:21:58
问题 I have a piece of kernel source which runs on the G970 on my PC but won't compile on my early 2015 MacBook pro with Iris 6100 1536MB graphic. platform = cl.get_platforms()[0] device = platform.get_devices()[1] # Get the GPU ID ctx = cl.Context([device]) # Tell CL to use GPU queue = cl.CommandQueue(ctx) # Create a command queue for the target device. # program = cl.Program(ctx, kernelsource).build() print platform.get_devices() This get_devices() show I have 'Intel(R) Core(TM) i5-5287U CPU @ 2

Why is my CPU doing matrix operations faster than GPU instead?

江枫思渺然 提交于 2021-02-10 15:38:32
问题 When I tried to verify that the GPU does matrix operations over the CPU, I got unexpected results.CPU performs better than GPU according to my experience result, it makes me confused. I used cpu and gpu to do matrix multiplication respectively.Programming environment is MXNet and cuda-10.1. with gpu: import mxnet as mx from mxnet import nd x = nd.random.normal(shape=(100000,100000),ctx=mx.gpu()) y = nd.random.normal(shape=(100000,100000),ctx=mx.gpu()) %timeit nd.dot(x,y) 50.8 µs ± 1.76 µs per

How to install Xgboost on macOS compiled with GPU Support?

浪尽此生 提交于 2021-02-10 12:48:26
问题 I am trying to install xgboost integrated with GPU support, on my MacOS Mojave(10.14.6) from last 3 days, however, no success has been reached. I tried 2 approaches: pip install xgboost xgboost is installed here and it runs successfully without GPU option(i.e., without tree_method=’gpu_hist’). I want to run with gpu_hist by giving “tree_method=’gpu_hist’ ” in tree parameters. When I gave “tree_method=’gpu_hist’ ” in tree parameters, following error has come: XGBoostError: [12:10:34] /Users

Troubles with slow speeds in opencl

筅森魡賤 提交于 2021-02-08 15:03:53
问题 I am trying to use opencl for the first time, the goal is to calculate the argmin of each row in an array. Since the operation on each row is independent of the others, I thought this would be easy to put on the graphics card. I seem to get worse performance using this code than when i just run the code on the cpu with an outer forloop, any help would be appreciated. Here is the code: #pragma OPENCL EXTENSION cl_khr_fp64 : enable int argmin(global double *array, int end) { double minimum =

Access pixels in gpu::mat

南楼画角 提交于 2021-02-08 10:50:53
问题 I'd like to know how to access pixel information when using OpenCV GPU. I'm currently downloading gpu::mat information to a mat variable but it is too slow. Does anyone know how to do it? 回答1: You could access to the data inside a kernel. For (row,col) the row and column number, the value of a pixel with channel number ch < 3 will be: uint8_t val = gpumat.data[ (row*gpumat.step) + col*gpumat.channels() + ch]; So let say you have an BGR input image stored on a GpuMat called src and you want to