- Using the raw intensities as the feature vector is not going to work1. There is too much variation induced by lighting etc.
- A good feature to look at as a first step would be HOG. opencv 2.2 has a GPU (cuda) version of a detector it that is fast.
- Neural networks are maybe not the best way to go. Usually you'd use a SVM or boosting as a classifier2. It's not that neural networks are not powerful enough, it's that it's hard to get the training/parameters right. Too often you get stuck in local minima etc.
- For prone/crouched/standing figures, you definitely want different classifiers and employ them in a mixture model.
- You asked for a "best way" - human detection is, by far, not a solved problem, so noone knows the best way. The things mentioned above are known to work pretty good.
- If you want a good result, you definitely want to exploit that your target is specific - so, exploit that you are trying to detect humans in call of duty. The range of positions that you need to check is not the whole image, the figures will be near the ground. This allows you to speed up the search and reduce false detections. If you can, reduce the detail on the rendering - less detail means less variation, which means an easier learning problem.
Footnotes:
1 For the nitpickers: Without a highly complex classifier.
2 You can also employ a cascade of boosted classifiers to gain speed without giving away too much in detection rate.