问题
I am new to Python and apologize in advance, if it is too simple. Cannot find anything and this question did not help.
My code is
# Split data
y = starbucks_smote.iloc[:, -1]
X = starbucks_smote.drop('label', axis = 1)
# Count labels by type
counter = Counter(y)
print(counter)
Counter({0: 9634, 1: 2895})
# Transform the dataset
oversample = SMOTE()
X, y = oversample.fit_resample(X, y)
# Print the oversampled dataset
counter = Counter(y)
print(counter)
Counter({0: 9634, 1: 9634})
How to save the oversampled dataset for future work?
I tried
data_res = np.concatenate((X, y), axis = 1)
data_res.to_csv('sample_smote.csv')
Got an error
ValueError: all the input arrays must have same number of dimensions,
but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)
Appreciate any tips!
回答1:
You may create dataframe:
data_res = pd.DataFrame(X)
data_res['y'] = y
and then save data_res
to CSV.
Solution based on concatenation od numpy.arrays
is also possible, but np.vstack is needed to make dimensions compliant:
data_res = np.concatenate((X, np.vstack(y)), axis = 1)
data_res = pd.DataFrame(data_res)
来源:https://stackoverflow.com/questions/63556933/saving-oversampled-dataset-as-csv-file-in-pandas