I am having Intel Core IvyBridge processor , Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz( L1-32KB,L2-256KB,L3-8MB). I know L3 is inclusive and shared among multiple core. I want
See section 2.2.5 in the Intel optimization guide -
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
(note that this applies for Sandy-Bridge, but doesn't appear as changed for Ivy-Bridge, which has only minor micro-architectural changes over the previous generation).
So regarding your questions:
Also note that if your benchmark is accessing a data-set larger than the L2, it will probably fail to sit in the L2 (especially if you access it serially and exceed the L2 by more than the size of a single way), and you'd have to fetch it from the L3.