def plot_decision_regions(X, y, classifier, resolution=0.02):
# setup marker generator and color map
markers = (\'s\', \'x\', \'o\', \'^\', \'v\')
colors = (
Let me explain every single line of the code:
x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
this part of the code deals in creating our limits in the graph. To make the graph less clumsy and clear,upper limit is increased by 1 and lower limit is decreased by 1.This helps our classification model not to touch the axes in case.
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution))
meshgrid method in numpy creates a co-ordinate matrix using co-ordinate vectors. here, to generalize, a rectangular(mesh) is formed whose length is x1_max-x1_min and breadth is x2_max-x2_min.
np.arange(start,stop,step): here, starting and endings are set and resolution is taken as step size. if resolution was larger than 0.02 (may be 2), the plotted points are clearly visible to human eye. inorder to create a completely smoother region, resolution is set to minimum necessary.
if you have co-ordinates
(-1,-2) (-1,0) (-1,1)
(0,-2) (0,0) (0,1)
(1,-2) (1,0) (1,1)
then meshgrid method converts it to 2 3X3 matrices
xx1 = [-1 -1 -1][0 0 0] [1 1 1] (3X3 matrix)
xx2 = [-2 -2 -2][0 0 0] [1 1 1] (3X3 matrix)
moving on to the next step,
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T) Z = Z.reshape(xx1.shape)
.ravel() method of numpy, creates a flattened 1D array here. as in the above example,
xx1.ravel() = [-1 -1 -1 0 0 0 1 1 1]
xx2.ravel() = [-2 -2 -2 0 0 0 1 1 1]
numpy.array() concatenates both the vectors into a single 2 X 9 array:
this gives,
[-1 -1 -1 0 0 0 1 1 1][-2 -2 -2 0 0 0 1 1 1] (2X9 matrix)
for this matrix, using .T, transpose is found. when transpose is done, this returns a 9x2 matrix. in which each row represents a co-ordinate pair. this obtained matrix is reshaped.
plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())
contourf is used to plot contour plots. here, Z forms our classifier in the space of xx1 x xx2. and plot limits are assigned.
Finally,
for idx, cl in enumerate(np.unique(y)):
plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],alpha=0.8,
c=cmap(idx),marker=markers[idx], label=cl)
in this part, available data points are plotted.
np.unique() returns a matrix of unique values. if your model has 2 outputs i.e. either yes or no, then whole data is classified into 2 categories.
enumerate() method returns count and value. for example:
elements = ('foo', 'bar', 'baz')
for count, elem in enumerate(elements)
... print count, elem
...
0 foo
1 bar
2 baz
so, in the above code , idx returns 0 for all points with "no" or 1 for all points with "yes". from
cmap = ListedColormap(colors[:len(np.unique(y))]), cmap(0) returns the first color to all the scattered points present under that category.
when loop is executed, then all data points belonging to a particular category are assigned same color and plotted in the graph.
Label creates a bar which enables us to know what value a particular color refers to.
This is how, classifiers are generally visualized.