Running perf stat ls
shows this:
Performance counter stats for \'ls\':
1.388670 task-clock # 0.067 CPUs utilized
Looks like perf
has not been updated to understand all the performance monitoring events that Ivy Bridge supports. Fortunately there is a generic, albeit cumbersome, interface that allows you to access the full list of performance monitoring events. I didn't see stalled-cycles-backend
in the list when I gave it a quick look, but maybe I missed, or maybe they have broken it down by all the different events that could stall the backend.
We start with
perf list --help
...shows the following NOTE
1. Intel(R) 64 and IA-32 Architectures Software Developer's Manual
Volume 3B: System Programming Guide
http://www.intel.com/Assets/PDF/manual/253669.pdf
...armed with that URL you end up in
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf
...you want section 19.3
19.3 PERFORMANCE MONITORING EVENTS FOR 3RD GENERATION INTEL® CORE™ PROCESSORS 3rd generation Intel® Core™ processors and Intel Xeon processor E3-1200 v2 product family are based on Intel microarchitecture code name Ivy Bridge. They support architectural performance-monitoring events listed in Table 19-1. Non-architectural performance-monitoring events in the processor core are listed in Table 19-5. The events in Table 19-5 apply to processors with CPUID signature of DisplayFamily_DisplayModel encoding with the following values: 06_3AH.
...so for architectural
events you need Table 19-1
19.1 ARCHITECTURAL PERFORMANCE-MONITORING EVENTS Architectural performance events are introduced in Intel Core Solo and Intel Core Duo processors. They are also supported on processors based on Intel Core microarchitecture. Table 19-1 lists pre-defined architectural performance events that can be configured using general-purpose performance counters and associated event-select registers.
**Table 19-1. Architectural Performance Events
... now comes the tricky part, you take the UMask Value
as the upper 2 hex digits and the Event Num
is the lower 2 hex digits of a 4 hex digit hardware register number to be given to perf stat
.
perf stat --help
-e, --event= Select the PMU event. Selection can be a symbolic event name (use perf list to list all events) or a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a hexadecimal event descriptor.
... it says NNN
but you can give it NNNN
. Let's verify that this works, let's ask perf stat
for cache-misses both as a symbolic event name and as a hex number from table 19-1. We'll invoke the date
command for simplicity.
$ perf stat -e r412e -e cache-misses date
Fri Mar 28 09:28:52 CDT 2014
Performance counter stats for 'date':
2292 r412e
2292 cache-misses
0.003322663 seconds time elapsed
$
As you can see both reported the same number, so far so good. Now we go to Table 19-5 for the non-architectural hardware registers, of which there are too many too list here, but I'll list a few:
Just found Re: perf, x86: Add parts of the remaining haswell PMU functionality:
> AFAICS backend stall cycles are documented to work on Ivy Bridge.
I'm not aware of any documentation that presents these events
as accurate frontend/backend stalls without using the full
TopDown methology (Optimization manual B.3.2)
So IIUC stalled-cycles-backend counters are too unreliable on Ivy Bridge, and that's why the kernel devs have decided to not support them.
And sure enough, Linux' perf_event_intel.c supports PERF_COUNT_HW_STALLED_CYCLES_BACKEND
for Nehalem, Xeon E7 and SandyBridge, but not for IvyBridge. PERF_COUNT_HW_STALLED_CYCLES_FRONTEND
is supported for IvyBridge, though.
So I guess there won't be a way to get this counter on my current CPU - either switch CPUs or use the full top-down methodology mentioned in the mail (and described here and here)
The perf
(or its in-kernel part) was not updated to support your CPU, so perf is unable to map generic event name "stalled-cycles-backend" to actual HW event.
In such case it can be easier to find event names; e.g. for Intel CPUs - from Intel's optimization manual http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf (which groups events by type and explains how to use them to measure various parts). Don't have similar document for AMD.
To use event names with perf without manual conversion into raw event ids (like amdn says in his answer), you can use converter scripts showevtinfo
and check_events
from perfmon2 (libpfm4; examples folder), as explained in the article "How to monitor the full range of CPU performance events" by Bojan Nikolic http://www.bnikolic.co.uk/blog/hpc-prof-events.html. perfmon2
knows AMD and Intel CPUs, and written in C/C++
For Intel CPUs the easiest way is to use ocperf
wrapper over perf
from Intel's open source python project by Andi Kleen "pmu-tools" hosted at github https://github.com/andikleen/pmu-tools and introduced here in ML: https://lwn.net/Articles/556983/ and in Andi's blog http://halobates.de/blog/p/245
The ocperf
understands all intel event names from Intel's optimization manual.
ocperf
will also support every HW event with older linux kernels. It has its own database in tsv or json format with all HW events and their codes at https://download.01.org/perfmon/ (there is auto-downloader in pmu-tools), and the database is constantly updated by Intel's employers. Format of database is documented in readme: https://download.01.org/perfmon/readme.txt
For Sandy Bridge/Ivy Bridge or Haswell, and kernels 3.10 or newer, you can also use toplev.py
script from "pmu-tools" to investigate performance. Here is description from its author, Andi Kleen, http://halobates.de/blog/p/262 "pmu-tools, part II: toplev" based on "TopDown" method from Ahmad Yasin "How to Tune Applications Using a Top-Down Characterization of Microarchitectural Issues and "Top Down Analysis. Never lost with performance counters"