问题
I am trying to plot a decision region (based on the output of a logistic regression) with matplotlib contourf funtion. The code I am using:
subplot.contourf(x2, y2, P, cmap=cmap_light, alpha = 0.8)
where x2 and y2 are two 2D matrices generated via numpy meshgrids. P is computed using
P = clf.predict(numpy.c_[x2.ravel(), y2.ravel()])
P = P.reshape(x2.shape)
Each element of P is a boolean value based on the output of the logistic regresssion. The rendered plot looks like this
My question is how does the contourf function know where to draw the contour based on a 2D matrix of boolean values? (x2, y2 are just numpy meshgrids) I looked up the docs several times but could not understand.
回答1:
In order to illustrate what's happening, here is an example using the 2 first features (sepal length and width) of the iris dataset.
First, the regression is calculated from the given data (dots with black outline). Then, for each point of a grid covering the data, a prediction is calculated (small dots in a grid). Note that the given and predicted values are just the numbers 0, 1 and 2. (In the question, only 0 and 1 are used.)
The last step is using these grid points as input to search contours of regions with an equal predicted value. So, a contour line is drawn between the grid points that have value 0 and the ones with value 1. And another between values 1 and 2. A contourf
fills the area between the lines with a uniform color.
As the grid points and their prediction aren't visualized in the question's plot, the sudden contours are harder to understand.
from matplotlib import pyplot as plt
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
X = X[:, :2]
clf = LogisticRegression(random_state=0).fit(X, y)
x2, y2 = np.meshgrid(np.linspace(X[:, 0].min()-.5, X[:, 0].max()+.5, 20),
np.linspace(X[:, 1].min()-.5, X[:, 1].max()+.5, 20) )
pred = clf.predict(np.c_[x2.ravel(), y2.ravel()])
cmap = plt.get_cmap('Set1', 3)
plt.scatter(x2.ravel(), y2.ravel(), c=pred, s=10, cmap=cmap, label='Prediction on grid')
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap=cmap, ec='black', label='Given values')
plt.contourf(x2, y2, pred.reshape(x2.shape), cmap=cmap, alpha=0.4, levels=2, zorder=0)
plt.legend(ncol=2, loc="lower center", bbox_to_anchor=(0.5,1.01))
plt.show()
PS: About pred.reshape(x2.shape)
:
x2
andy2
are arrays giving the x and y coordinate of each grid point.x2
andy2
are organized as 2D arrays similar to the grid they represent (20x020 in the example).- However, the function
clf.predict
needs its input arrays to be 1d. To that end,.ravel()
is used: it just makes one long 1d array out of the 2d array. In the example,ravel
converts the 20x20 arrays to 1d arrays of 400. - The result of
pred = clf.predict
is a corresponding 1d array (400 elements). pred.reshape(x2.shape)
convertspred
to the same 2d format asx2
andy2
(again 20x20).- Note that
scatter
wants its parameters in 1d format, it only looks at each point individually.contourf
on the other hand wants its parameters in 2d format, as it needs to know how the grid is organized.
来源:https://stackoverflow.com/questions/63234019/explain-matplotlib-contourf-function