I am trying to figure out how to make the camera work like this:
I believe this has to do with a bit of a mix up between the "camera matrix" (world space position of the camera), and it's inverse matrix the "view matrix" (matrix which converts from world space to view space).
First, a little background.
You're starting with a world space position of the camera, and it's X, Y, and Z rotation. If this camera was just a typical object we were placing in the scene, we would set it up like this:
glTranslate(camX, camY, camZ);
glRotate(x);
glRotate(y);
glRotate(z);
All together these operations create the matrix I will define as "CameraToWorldMatrix", or "the matrix that transforms from camera space to world space".
However, when we're dealing with view matrices, we don't want to transform from camera space to world space. For the view matrix we want to transform coordinates from world space into camera space (the inverse operation). So our view matrix is really a "WorldToCameraMatrix".
The way you take the "inverse" of the "CameraToWorldMatrix" would be to perform all of the operations in the reverse order (which you came close to doing, but got the order slightly mixed up).
The inverse of the above matrix would be:
glRotate(-z);
glRotate(-y);
glRotate(-x);
glTranslate(-camX, -camY, -camZ);
Which is almost what you had, but you had the order mixed up.
In your code here:
Math::Matrix<float> r = Math::IdentityMatrix();
r *= Math::RotationMatrix(rot.X, 1.0f, 0, 0);
r *= Math::RotationMatrix(rot.Y, 0, 1.0f, 0);
r *= Math::RotationMatrix(rot.Z, 0, 0, 1.0f);
Vector3D<float> new_pos = Math::ApplyMatrix(tr, r);
currentCamera->SetPosition(currentCamera->Position() + new_pos);
You were defining the "CameraToWorldMatrix" as "first rotate around X, then Y, then Z, then translate".
However when you inverse this, you get something different than what you were using as your "WorldToCameraMatrix", which was (translate, then rotate around z, then rotate around y, then rotate around x).
Because your view matrix and camera matrix were not actually defining the same thing, they get out of sync and you get weird behavior.