I\'m putting together a small patch for the cachegrind/callgrind tool in valgrind which will auto-detect, using completely generic code, CPU instruction and cache configuration
Could you do a small program that only accesses lines from the same set? Then you can increase the stack distance between the accesses and when the execution time dramatically fall, you can assume you have reach the associativity.
It's probably not very stable, but maybe that could give a lead, don't know. I hope it can help.