I am having trouble understanding the weight update rule for perceptrons:
w(t + 1) = w(t) + y(t)x(t).
Assume we have a linearly separable data set.
A better derivation of the perceptron update rule is documented here and here. The derivation is using gradient descent.
PS: I was trying very hard to get the intuition on why would someone multiply x and y to derive the update for w. Because w is the slope for a single dimension (y = wx+c) and slope w = (y/x) and not y * x.