I am trying to figure out why a modified C program is running faster than its non modified counter part (I am adding very few lines of code to perform some additional work).
Some answers:
L1
is the Level-1 cache, the smallest and fastest one. LLC
on the other hand refers to the last level of the cache hierarchy, thus denoting the largest but slowest cache.i
vs. d
distinguishes instruction cache from data cache. Only L1 is split in this way, other caches are shared between data and instructions.TLB
refers to the translation lookaside buffer, a cache used when mapping virtual addresses to physical ones.You seem to think that the cache-misses
event is the sum of all other kind of cache misses (L1-dcache-load-misses
, and so on). That is actually not true.
the cache-misses
event represents the number of memory access that could not be served by any of the cache.
I admit that perf's documentation is not the best around.
However, one can learn quite a lot about it by reading (assuming that you already have a good knowledge of how a CPU and a performance monitoring unit work, this is clearly not a computer architecture course) the doc of the perf_event_open() function:
http://web.eece.maine.edu/~vweaver/projects/perf_events/perf_event_open.html
For example, by reading it you can see that the cache-misses
event showed by perf list corresponds to PERF_COUNT_HW_CACHE_MISSES
According to perf tutorial, Performance Monitoring Unit (PMU) events or hardware events refer to those events which can be mapped directly to CPU specific events for a CPU vendor. But the hardware cache events refer to some hardware events monikers provided by perf
, which may be mapped to actual events provided by the CPU. For the list of perf
's cache events use perf list cache
in Linux terminal.