def plot_decision_regions(X, y, classifier, resolution=0.02):
# setup marker generator and color map
markers = (\'s\', \'x\', \'o\', \'^\', \'v\')
colors = (
A ListedColormap is a colormap with listed colors. Such a colormap may be useful for showing discrete colorlevels, e.g. in an image plot (imshow
or pcolormesh
), other 2D plots like tripcolor
or a scatter plot. Also contour
plots can take colormaps and using a ListedColormap
is just one option you have to show the different contour levels in different colors.
If you already have a list of colors, you may also use this list of colors directly for your contour plot. Both options are available and the advantage of the colormap would in this case only be that you can easily create a colorbar for your plot.
See below for a comparison between directly using a list of colors and using a colormap.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.colors
x = np.linspace(-3,3)
X,Y = np.meshgrid(x,x)
Z = np.exp(-(X**2+Y**2))
fig, (ax, ax2) = plt.subplots(ncols=2)
colors=["red", "orange", "gold", "limegreen", "k",
"#550011", "purple", "seagreen"]
ax.set_title("contour with color list")
contour = ax.contourf(X,Y,Z, colors=colors)
ax2.set_title("contour with colormap")
cmap = matplotlib.colors.ListedColormap(colors)
contour = ax2.contourf(X,Y,Z, cmap=cmap)
fig.colorbar(contour)
plt.show()
You observe a slightly differnt behavior between the two cases, namely that the list of colors colorizes one level after the other according to the given list, while the colormap maps the range between the minimum and maximum value to the colormap, such that the first and last color are definitely in the plot, but intermediate colors (like "limegreen" in this case) are omitted, because we have one level less than colors in the color list.
Let me explain every single line of the code:
x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
this part of the code deals in creating our limits in the graph. To make the graph less clumsy and clear,upper limit is increased by 1 and lower limit is decreased by 1.This helps our classification model not to touch the axes in case.
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution))
meshgrid method in numpy creates a co-ordinate matrix using co-ordinate vectors. here, to generalize, a rectangular(mesh) is formed whose length is x1_max-x1_min and breadth is x2_max-x2_min.
np.arange(start,stop,step): here, starting and endings are set and resolution is taken as step size. if resolution was larger than 0.02 (may be 2), the plotted points are clearly visible to human eye. inorder to create a completely smoother region, resolution is set to minimum necessary.
if you have co-ordinates
(-1,-2) (-1,0) (-1,1)
(0,-2) (0,0) (0,1)
(1,-2) (1,0) (1,1)
then meshgrid method converts it to 2 3X3 matrices
xx1 = [-1 -1 -1][0 0 0] [1 1 1] (3X3 matrix)
xx2 = [-2 -2 -2][0 0 0] [1 1 1] (3X3 matrix)
moving on to the next step,
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T) Z = Z.reshape(xx1.shape)
.ravel() method of numpy, creates a flattened 1D array here. as in the above example,
xx1.ravel() = [-1 -1 -1 0 0 0 1 1 1]
xx2.ravel() = [-2 -2 -2 0 0 0 1 1 1]
numpy.array() concatenates both the vectors into a single 2 X 9 array:
this gives,
[-1 -1 -1 0 0 0 1 1 1][-2 -2 -2 0 0 0 1 1 1] (2X9 matrix)
for this matrix, using .T, transpose is found. when transpose is done, this returns a 9x2 matrix. in which each row represents a co-ordinate pair. this obtained matrix is reshaped.
plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())
contourf is used to plot contour plots. here, Z forms our classifier in the space of xx1 x xx2. and plot limits are assigned.
Finally,
for idx, cl in enumerate(np.unique(y)):
plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],alpha=0.8,
c=cmap(idx),marker=markers[idx], label=cl)
in this part, available data points are plotted.
np.unique() returns a matrix of unique values. if your model has 2 outputs i.e. either yes or no, then whole data is classified into 2 categories.
enumerate() method returns count and value. for example:
elements = ('foo', 'bar', 'baz')
for count, elem in enumerate(elements)
... print count, elem
...
0 foo
1 bar
2 baz
so, in the above code , idx returns 0 for all points with "no" or 1 for all points with "yes". from
cmap = ListedColormap(colors[:len(np.unique(y))]), cmap(0) returns the first color to all the scattered points present under that category.
when loop is executed, then all data points belonging to a particular category are assigned same color and plotted in the graph.
Label creates a bar which enables us to know what value a particular color refers to.
This is how, classifiers are generally visualized.