问题
I installed perf on Haswell CPU( Intel Core i7-4790 ). But the "perf list" does not include "stalled-cycles-frontend" nor "stalled-cycles-backend". I checked the http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html and not found the performance events relevant to stalled-cycles-backend from the Table 19-7( Non-Architectural Performance Events In the Processor Core of 4th Generation Intel Core Processors).
So my question is: how can I measure stalled-cycles-backend using perf or other tools in Haswell CPU cores. The kernel is 3.19 and perf version is also 3.19.
Thanks
回答1:
Yes, there is no mapping of "stalled-cycles-frontend" and "stalled-cycles-backend" synthetic events in perf_events
subsystem in kernel for newer processors like Ivy Bridge or Haswell. And no mapping on older Core 2. Probably, this name/concept/idea is not good for changed and complex microarchitectures of modern Out-of-order CPUs without simple scalar measurement of global "Stall".
The code is in arch/x86/events/intel/core.c, and the synthetic event names are PERF_COUNT_HW_STALLED_CYCLES_FRONTEND
and PERF_COUNT_HW_STALLED_CYCLES_BACKEND
:
__init int intel_pmu_init(void)
{...
Both are defined since Nehalem, for Westmere, Sandy Bridge:
case INTEL_FAM6_NEHALEM:
case INTEL_FAM6_NEHALEM_EP:
case INTEL_FAM6_NEHALEM_EX:
/* UOPS_ISSUED.STALLED_CYCLES */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
/* UOPS_EXECUTED.CORE_ACTIVE_CYCLES,c=1,i=1 */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =
X86_CONFIG(.event=0xb1, .umask=0x3f, .inv=1, .cmask=1);
case INTEL_FAM6_WESTMERE:
case INTEL_FAM6_WESTMERE_EP:
case INTEL_FAM6_WESTMERE_EX:
/* UOPS_ISSUED.STALLED_CYCLES */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
/* UOPS_EXECUTED.CORE_ACTIVE_CYCLES,c=1,i=1 */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =
X86_CONFIG(.event=0xb1, .umask=0x3f, .inv=1, .cmask=1);
case INTEL_FAM6_SANDYBRIDGE:
case INTEL_FAM6_SANDYBRIDGE_X:
/* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
/* UOPS_DISPATCHED.THREAD,c=1,i=1 to count stall cycles*/
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =
X86_CONFIG(.event=0xb1, .umask=0x01, .inv=1, .cmask=1);
Only frontend stall is defined for Ivy Bridge
case INTEL_FAM6_IVYBRIDGE:
case INTEL_FAM6_IVYBRIDGE_X:
/* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
No mapping for frontend and for backend stalls for more recent CPUs desktop (Haswell, Broadwell, Skylake, Kaby Lake) and Phi (KNL, KNM):
case INTEL_FAM6_HASWELL_CORE:
case INTEL_FAM6_HASWELL_X:
case INTEL_FAM6_HASWELL_ULT:
case INTEL_FAM6_HASWELL_GT3E:
case INTEL_FAM6_BROADWELL_CORE:
case INTEL_FAM6_BROADWELL_XEON_D:
case INTEL_FAM6_BROADWELL_GT3E:
case INTEL_FAM6_BROADWELL_X:
case INTEL_FAM6_XEON_PHI_KNL:
case INTEL_FAM6_XEON_PHI_KNM:
case INTEL_FAM6_SKYLAKE_MOBILE:
case INTEL_FAM6_SKYLAKE_DESKTOP:
case INTEL_FAM6_SKYLAKE_X:
case INTEL_FAM6_KABYLAKE_MOBILE:
case INTEL_FAM6_KABYLAKE_DESKTOP:
Not defined for old Core2 too (did not check Atoms):
http://elixir.free-electrons.com/linux/v4.11/source/arch/x86/events/intel/core.c#L27
static u64 intel_perfmon_event_map[PERF_COUNT_HW_MAX] __read_mostly =
{
[PERF_COUNT_HW_CPU_CYCLES] = 0x003c,
[PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0,
[PERF_COUNT_HW_CACHE_REFERENCES] = 0x4f2e,
[PERF_COUNT_HW_CACHE_MISSES] = 0x412e,
[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c4,
[PERF_COUNT_HW_BRANCH_MISSES] = 0x00c5,
[PERF_COUNT_HW_BUS_CYCLES] = 0x013c,
[PERF_COUNT_HW_REF_CPU_CYCLES] = 0x0300, /* pseudo-encoding */
};
来源:https://stackoverflow.com/questions/36348940/haswell-microarchitecture-dont-have-stalled-cycles-backend-in-perf