cudf

cuDF - Not leveraging GPU cores

六眼飞鱼酱① 提交于 2021-02-11 15:16:55
问题 I am the below piece of code in python with cuDF to speed up the process. But I do not see any difference in the speed when compared to my 4 core local machine cpu. GPU configuration is 4 x NVIDIA Tesla T4 def arima(train): h = [] for each in train: model = pm.auto_arima(np.array(ast.literal_eval(each))) p = model.predict(1).item(0) h.append(p) return h for t_df in pd.read_csv("testset.csv",chunksize=1000): t_df = cudf.DataFrame.from_pandas(t_df) t_df['predicted'] = arima(t_df['prev_sales'])

TypeError: data must be list or dict-like in CUDF

好久不见. 提交于 2021-02-11 12:03:47
问题 I am implementing CUDF to speed up my python process. Firstly, I import CUDF and removed multiprocessing code, and initialize variables with CUDF. After changing into CUDF it gives a dictionary error. How I can remove these loops to make effective implementation? Code import more_itertools import pandas as pd import numpy as np import itertools from os import cpu_count from sklearn.metrics import confusion_matrix, accuracy_score, roc_curve, auc import matplotlib.pyplot as plt import json

使用Python玩转GPU

旧街凉风 提交于 2020-05-04 09:32:31
问题 随着机器学习对模型运算速度的需求越来越强烈, 一直想进行GPU编程,但一直以来这些都是c++的专利 一想到c++里的各种坑,就提不起劲来,毕竟这样来来回回填坑的投入产出,生产效率就会大打折扣 解决方案 让人欣喜的是,随着Python阵营的不断发展壮大,使用python进行GPU编程也越来越便捷了 那么具体有些什么样的包,能针对GPU做些啥事呢? 看看一些具体的代码,就能大概明白: 首先是pycuda,这是它的一个例子: mod = SourceModule(""" __global__ void multiply_them(float *dest, float *a, float *b) { const int i = threadIdx.x; dest[i] = a[i] * b[i]; } """) 由上面的代码我们可以看出,pycuda将调用gpu的c++代码做了包装,可以在python里直接使用 再看看numba: @cuda.jit def increment_by_one(an_array): pos = cuda.grid(1) if pos < an_array.size: an_array[pos] += 1 我们可以发现,numba更进一步,直接使用装饰器的办法让调用GPU的过程更简洁方便 再看看cupy: import numpy as np

How to read a single large parquet file into multiple partitions using dask/dask-cudf?

∥☆過路亽.° 提交于 2020-02-05 13:09:23
问题 I am trying to read a single large parquet file (size > gpu_size), using dask_cudf / dask but it is currently reading it into a single partition, which i am guessing is the expected behavior inferring from the doc-string: dask.dataframe.read_parquet(path, columns=None, filters=None, categories=None, index=None, storage_options=None, engine='auto', gather_statistics=None, **kwargs): Read a Parquet file into a Dask DataFrame This reads a directory of Parquet data into a Dask.dataframe, one file