Saving oversampled dataset as csv file in pandas

孤人 提交于 2020-12-15 03:42:39

问题


I am new to Python and apologize in advance, if it is too simple. Cannot find anything and this question did not help.

My code is

# Split data
y = starbucks_smote.iloc[:, -1]
X = starbucks_smote.drop('label', axis = 1)

# Count labels by type
counter = Counter(y)
print(counter)
Counter({0: 9634, 1: 2895})

# Transform the dataset
oversample = SMOTE()
X, y = oversample.fit_resample(X, y)

# Print the oversampled dataset
counter = Counter(y)
print(counter)
Counter({0: 9634, 1: 9634})

How to save the oversampled dataset for future work?

I tried

data_res = np.concatenate((X, y), axis = 1)
data_res.to_csv('sample_smote.csv')

Got an error

ValueError: all the input arrays must have same number of dimensions, 
but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

Appreciate any tips!


回答1:


You may create dataframe:

data_res = pd.DataFrame(X)
data_res['y'] = y

and then save data_res to CSV.

Solution based on concatenation od numpy.arrays is also possible, but np.vstack is needed to make dimensions compliant:

data_res = np.concatenate((X, np.vstack(y)), axis = 1)
data_res = pd.DataFrame(data_res)


来源:https://stackoverflow.com/questions/63556933/saving-oversampled-dataset-as-csv-file-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!