cuDF - Not leveraging GPU cores

问题

I am the below piece of code in python with cuDF to speed up the process. But I do not see any difference in the speed when compared to my 4 core local machine cpu. GPU configuration is 4 x NVIDIA Tesla T4

def arima(train):
    h = []
    for each in train:
        model = pm.auto_arima(np.array(ast.literal_eval(each)))
        p = model.predict(1).item(0)
        h.append(p)
    return h


for t_df in pd.read_csv("testset.csv",chunksize=1000):
    t_df = cudf.DataFrame.from_pandas(t_df)
    t_df['predicted'] = arima(t_df['prev_sales'])

What I am missing here?

回答1:

While, i'll help you with your issue of not accessing all the GPUs, I'll share with you a performance tip: If all your data fits on a single GPU, then you should use stick with single GPU processing using cudf as it is much faster, as it doesn't require any orchestration overhead. If not, then read on :)

The reason why you're not utilizing the 4 GPUs is because you're not using dask-cudf. cudf is a single GPU library. dask-cudf allows you to scale it out to multiple gpus and multiple nodes, or process datasets with "larger than GPU memory" sizes.

Here is a great place to start: https://docs.rapids.ai/api/cudf/stable/10min.html

As for your speed issue, you should be reading the CSV directly into GPU through cudf, if possible. In your code, you're reading the data twice - once to host [CPU] with pandas and once to cudf [GPU] from pandas. It's unnecessary - and you lose all the benefits of GPU acceleration on read. On large datasets, cudf will give you a pretty nice file read speedup compared to pandas.

import dask_cudf
df = dask_cudf.read_csv("testset.csv", npartitions=4) # or whatever multiples of the # of GPUs that you have

and then go from there. Be sure to set up a client. https://docs.rapids.ai/api/cudf/stable/10min.html#Dask-Performance-Tips. This information is also found in that link, which is in the same page linked as above. No for loops required :).

For the rest of it, I am assuming that you're using the cuml for your machine learning algos, like ARIMA. https://docs.rapids.ai/api/cuml/stable/api.html?highlight=arima#cuml.tsa.ARIMA. Here is an example notebook: https://github.com/rapidsai/cuml/blob/branch-0.14/notebooks/arima_demo.ipynb

来源：https://stackoverflow.com/questions/61345809/cudf-not-leveraging-gpu-cores

标签

python

pandas

gpu

cudf