TypeError: data must be list or dict-like in CUDF

好久不见. 提交于 2021-02-11 12:03:47

问题


I am implementing CUDF to speed up my python process. Firstly, I import CUDF and removed multiprocessing code, and initialize variables with CUDF. After changing into CUDF it gives a dictionary error.

How I can remove these loops to make effective implementation?

Code

import more_itertools
import pandas as pd
import numpy as np
import itertools
from os import cpu_count
from sklearn.metrics import confusion_matrix, accuracy_score, roc_curve, auc
import matplotlib.pyplot as plt
import json
import os
import gc
from tqdm import tqdm
import cudf

gc.collect()
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
import logging

mpl_logger = logging.getLogger('matplotlib')
mpl_logger.setLevel(logging.WARNING)

with open(Path(__file__).parent / "ageDB.json", "r") as f:
    identities = json.load(f)

positives = cudf.DataFrame()

for value in tqdm(identities.values(), desc="Positives"):
    positives = positives.append(cudf.DataFrame(itertools.combinations(value, 2), columns=["file_x", "file_y"]),
                                 ignore_index=True)

positives["decision"] = "Yes"
print(positives)

samples_list = list(identities.values())
negatives = cudf.DataFrame()


######################====================Functions=============##############

def compute_cross_samples(x):
    return cudf.DataFrame(itertools.product(*x), columns=["file_x", "file_y"])

####################################
if Path("positives_negatives.csv").exists():
    df = cudf.read_csv("positives_negatives.csv")
else:
    for combos in tqdm(more_itertools.ichunked(itertools.combinations(identities.values(), 2), cpu_count())):
        for cross_samples in (compute_cross_samples, combos):
            negatives = negatives.append(cross_samples)

negatives["decision"] = "No"
negatives = negatives.sample(positives.shape[0])
df = cudf.concat([positives, negatives]).reset_index(drop=True)
df.to_csv("positives_negatives.csv", index=False)

df.file_x = "deepface/tests/dataset/" + df.file_x
df.file_y = "deepface/tests/dataset/" + df.file_y

Tracback

Traceback (most recent call last):
  File "Ensemble-Face-Recognition.py", line 36, in <module>
    positives = positives.append(cudf.DataFrame(itertools.combinations(value, 2), columns=["file_x", "file_y"]),
  File "/home/khawar/anaconda3/envs/rapids-0.17/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/khawar/anaconda3/envs/rapids-0.17/lib/python3.7/site-packages/cudf/core/dataframe.py", line 289, in __init__
    raise TypeError("data must be list or dict-like")
TypeError: data must be list or dict-like

回答1:


itertools.combinations returns a generator, so you need to explicitly call list to get a list-like value

cudf.DataFrame(list(itertools.combinations(value, 2)) . . .  

As an aside, I'm not sure if this is the case in cudf, but in pandas, it is much faster to create a list of dataframes and concatenate them at the end then it is to create an empty dataframe and then to continually append to it. Your loop continuously sets positives to an appended dataframe.




回答2:


Formatting your last comment:

positives = positives.append(cudf.DataFrame(
   list(itertools.combinations(value, 2), columns=["file_x", "file_y"]), 

TypeError: list() takes no keyword arguments 

Because of how you pair (), columns is taken as an argument to list, not to DataFrame.



来源:https://stackoverflow.com/questions/66073491/typeerror-data-must-be-list-or-dict-like-in-cudf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!