I am having trouble understanding the weight update rule for perceptrons:
w(t + 1) = w(t) + y(t)x(t).
Assume we have a linearly separable data set.
The perceptron's output is the hard limit of the dot product between the instance and the weight. Let's see how this changes after the update. Since
w(t + 1) = w(t) + y(t)x(t),
then
x(t) ⋅ w(t + 1) = x(t) ⋅ w(t) + x(t) ⋅ (y(t) x(t)) = x(t) ⋅ w(t) + y(t) [x(t) ⋅ x(t))].
Note that:
How does this move the boundary relative to x(t)?
A better derivation of the perceptron update rule is documented here and here. The derivation is using gradient descent.
PS: I was trying very hard to get the intuition on why would someone multiply x and y to derive the update for w. Because w is the slope for a single dimension (y = wx+c) and slope w = (y/x) and not y * x.