问题
I am plotting 2D plot for SVC Bernoulli output.
converted to vectors from Avg word2vec and standerdised data split data to train and test. Through grid search found the best C and gamma(rbf)
clf = SVC(C=100,gamma=0.0001)
clf.fit(X_train1,y_train)
from mlxtend.plotting import plot_decision_regions
plot_decision_regions(X_train, y_train, clf=clf, legend=2)
plt.xlabel(X.columns[0], size=14)
plt.ylabel(X.columns[1], size=14)
plt.title('SVM Decision Region Boundary', size=16)
Receive error :- ValueError: y must be a NumPy array. Found
also tried to convert the y to numpy. Then it prompts error ValueError: y must be an integer array. Found object. Try passing the array as y.astype(np.integer)
finally i converted it to integer array. Now it is prompting of error. ValueError: Filler values must be provided when X has more than 2 training features.
回答1:
I've spent some time with this too as plot_decision_regions
was then complaining ValueError: Column(s) [2] need to be accounted for in either feature_index or filler_feature_values
and there's one more parameter needed to avoid this.
So, say, you have 4 features and they come unnamed:
X_train_std.shape[1] = 4
We can refer to each feature by their index 0, 1, 2, 3. You only can plot 2 features at a time, say you want 0
and 2
.
You'll need to specify one additional parameter (to those specified in @sos.cott's answer), feature_index, and fill the rest with fillers:
value=1.5
width=0.75
fig = plot_decision_regions(X_train.values, y_train.values, clf=clf,
feature_index=[0,2], #these one will be plotted
filler_feature_values={1: value, 3:value}, #these will be ignored
filler_feature_ranges={1: width, 3: width})
回答2:
You can use PCA to reduce your data multi-dimensional data to two dimensional data. Then pass the obtained result in plot_decision_region
and there will be no need of filler values.
from sklearn.decomposition import PCA
from mlxtend.plotting import plot_decision_regions
clf = SVC(C=100,gamma=0.0001)
pca = PCA(n_components = 2)
X_train2 = pca.fit_transform(X_train)
clf.fit(X_train2, y_train)
plot_decision_regions(X_train2, y_train, clf=clf, legend=2)
plt.xlabel(X.columns[0], size=14)
plt.ylabel(X.columns[1], size=14)
plt.title('SVM Decision Region Boundary', size=16)
回答3:
You can just do (Assuming X_train and y_train are still panda dataframes) for the numpy array problem.
plot_decision_regions(X_train.values, y_train.values, clf=clf, legend=2)
For the filler_feature issue, you have to specify the number of features so you do the following:
value=1.5
width=0.75
fig = plot_decision_regions(X_train.values, y_train.values, clf=clf,
filler_feature_values={2: value, 3:value, 4:value},
filler_feature_ranges={2: width, 3: width, 4:width},
legend=2, ax=ax)
You need to add one filler feature for each feature you have.
来源:https://stackoverflow.com/questions/52952310/plot-decision-regions-with-error-filler-values-must-be-provided-when-x-has-more