Saving oversampled dataset as csv file in pandas

问题

I am new to Python and apologize in advance, if it is too simple. Cannot find anything and this question did not help.

My code is

# Split data
y = starbucks_smote.iloc[:, -1]
X = starbucks_smote.drop('label', axis = 1)

# Count labels by type
counter = Counter(y)
print(counter)
Counter({0: 9634, 1: 2895})

# Transform the dataset
oversample = SMOTE()
X, y = oversample.fit_resample(X, y)

# Print the oversampled dataset
counter = Counter(y)
print(counter)
Counter({0: 9634, 1: 9634})

How to save the oversampled dataset for future work?

I tried

data_res = np.concatenate((X, y), axis = 1)
data_res.to_csv('sample_smote.csv')

Got an error

ValueError: all the input arrays must have same number of dimensions, 
but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

Appreciate any tips!

回答1:

You may create dataframe:

data_res = pd.DataFrame(X)
data_res['y'] = y

and then save data_res to CSV.

Solution based on concatenation od numpy.arrays is also possible, but np.vstack is needed to make dimensions compliant:

data_res = np.concatenate((X, np.vstack(y)), axis = 1)
data_res = pd.DataFrame(data_res)

来源：https://stackoverflow.com/questions/63556933/saving-oversampled-dataset-as-csv-file-in-pandas

标签

python

pandas

numpy

resampling

smote

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!