If someone doesn't want to read the paper by Lowe, which @sammy mentioned, here is some short resume:
- Image pyramid (see OpenCV doc and wiki) is basically a set of images based on a single image that we have downsampled and downscaled multiple times. An example for such pyramid is the Gaussian pyramid. We use pyramids in feature detection and matching for various reasons. It has been noticed in the past that downsampling and also downscaling an image to a certain level does not mean that we loose all the features that we require for feature matching for example and in fact it often removes some of the noise. High resolution (not to be confused with the image's width and height!) is also often not something that we need since (wikipedia) higher resolution also means more details in the image but more details also means more processing power required, which is a killer if you run your application on a platform with low performance and low power consumption in mind such as smartphones. If you combine this with a huge scale of your image (dimensions) the whole thing gets even worse. Of course it depends on the image and on the number of layers our pyramid has. As we know downsampling alters the pixels in the image in a way. Each feature is described by a keypoint and a descriptor. Because of the change in pixels when downsampling features also change and so do their descriptors and keypoints. That is why a keypoint has to store the information at which level in the image pyramid it was extracted. Note that creating image pyramids requires a decent amount of resources. However this trade-off is justified when you start doing something else with those images such as matching.
- Keypoint angle relates to the orientation of the feature that the keypoint represents. A keypoint is actually not a single pixel but a small region inside a feature (calling .pt.x and .pt.y just returns the center of the keypoint) so when changing it's orientation the pixels change their position from the perspective of the keypoint. Imagine you have a house with a door, roof etc. We extract some features of that house. Then we turn our camera upside down and take a new photo from the exact same position. If the feature extractor supports orientation, we should get (almost) the same features (ergo same keypoints) as in the picture we shot before that change in the orientation of our camera. If the feature extractor does not support orientation, we might loose most of our previously detected features and/or get new ones.
I recommend reading "Learning OpenCV". It is outdated in terms of OpenCV's API but the theory discussed there is really well explained.