As I have it understood, a projection matrix scales a polygon depending on how far away or close it is from the camera. Though I might be completely wrong. My question i
The function of a projection matrix (in the context of graphics APIs, such as OpenGL) is to transform vertex positions from view-space into clip-space.
Clip space is generally a unit box (although in D3D it's a half-unit box). If a vertex position after being transformed into clip-space does not lie within that unit box, then it is clipped. This is essentially how the system "knows" the cube is visible on the screen.