As a part of my thesis work, I need to build a program for human tracking from video or image sequence like the KTH or IXMAS dataset with the assumptions:
The above method : a simple frame differencing followed by dilation and erosion would work, in case of a simple clean scene with just the motion of the person walking with absolutely no other motion or illumination changes. Also you are doing a detection every frame, as opposed to tracking. In this specific scenario, it might not be much more difficult to track either. Movement direction and speed : you can just run Lucas Kanade on the difference images.
At the core of it, what you need is a person detector followed by a tracker. Tracker can be either point based (Lucas Kanade or Horn and Schunck) or using Kalman filter or any of those kind of tracking for bounding boxes or blobs.
A lot of vision problems are ill-posed, some some amount of structure/constraints, helps to solve it considerably faster. Few questions to ask would be these :