How does the associativity and page size constrain the Cache size in virtually addressed cache architecture?
Particular
Intel CPUs have used 8-way associative 32kiB L1D with 64B lines for many years, for exactly this reason. Pages are 4k, so the page offset is 12 bits, exactly the same number of bits that make up the index and offset within a cache line.
See the "L1 also uses speed tricks that wouldn't work if it was larger" paragraph in this answer for more details about how it lets the cache avoid aliasing problems like a PIPT cache, but still be as fast as a VIPT cache.
The idea is that the virtual address bits below the page offset are already physical address bits. So a VIPT cache that works this way is more like a PIPT cache with free translation of the index bits.