I am trying to find out if there are any principles in defining which pages should be gzip-compressed and to draw a line when to send plain html content.
It would be he
A good idea is to benchmark, how fast is the data coming down v.s. how well compressed is it. If it takes 5 seconds to send something that went from 200K to 160K it's probably not worth it. There is a cost of compression on the server side and if the server gets busy it might not be worth it.
For the most part, if your server load is below 0.8 regularly, I'd just gzip anything that isn't binary like jpegs, pngs and zip files.
There's a good write up here:
http://developer.yahoo.com/performance/rules.html#gzip