You should also check the Examples in the SDK. I remember one very nice example with implementations of several ways of reductions. At least one of those also uses warp based reduction.
(I can't look up the name right now, because I have it only installed on my other machine)