I understand conceptually what is happening in a max/sum pool as a CNN layer operation, but I see this term \"max pool over time\", or \"sum pool over time\" thrown around (e.g.
Max pooling typically applies to regions in a 2d feature plane, while max pooling over time happens along a 1d feature vector.
Here is a demonstration of max pooling from Stanford's CS231n:
Max pooling over time takes a 1d feature vector and computes the max. The "over time" just means this is happening along the time dimension for some sequential input, like a sentence, or a concatenation of all phrases from a sentence as in the paper you linked.
For example:
[2, 7, 4, 1, 5] -> [7]
Source: CS224d Lecture 13 slides