This is a spin off of this StackOverflow question.
Assume that you have a fixed number k of storage locations, and space for two counters. You will receive
A wild guess: discard the element that is farthest from the mean of the currently stored values.
Comparing to the current median doesn't work if the distribution of values is multi-modal and we get values from a non-dominant mode first.
Munro and Paterson studied essentially this problem in their paper Selection and sorting with limited storage. They show that your algorithm requires k = Ω(√n) to succeed with constant probability and that this is asymptotically optimal by appealing to basic results about one-dimensional random walks.
If I wanted to prove absolute optimality, the first thing I would try would be to consider an arbitrary algorithm A and then couple its execution with an algorithm A' that, the first time A deviates from your algorithm, does your algorithm would do instead and then attempts to follow A as closely as it can.