How to Configure and Sample Intel Performance Counters In-Process

蓝咒 提交于 2019-12-11 00:53:16

问题


In a nutshell, I'm trying to achieve the following inside a userland benchmark process (pseudo-code, assuming x86_64 and a UNIX system):

results[] = ...
for (iteration = 0; iteration < num_iterations; iteration++) {
    pctr_start = sample_pctr();
    the_benchmark();
    pctr_stop = sample_pctr();
    results[iteration] = pctr_stop - pctr_start;
}

FWIW, the performance counter I am thinking of using is CPU_CLK_UNHALTED.THREAD_ALL, to read the number of core cycles independent of clock frequency changes (In an earlier question I had been planning to use the TSC register for this, but alas, that is not what this register measures at all).

My initial intention was to use inline assembler to first configure a counter using WRMSR, then to read the counter using RDPMC inside sample_pctr().

I stumbled at the first hurdle, as writing MSRs requires kernel privileges. It seems like you can in fact read the counters from user space (if configured correctly), but the act of configuring the counter (with an MSR) needs to be undertaken by the kernel.

Does anyone know a lightweight way to ask the kernel to configure the a performance counters from user-space so that I can then use RDPMC from within my benchmark harness?

Stuff I've looked into/thought about:

  • Perf tools for Linux. Seems to be geared up for sampling over the whole lifetime of a process, not within a process as specific points (before and after each iteration).
  • Use perf syscalls directly (i.e. perf_event_open). Looks like the counter value will only update periodically (using a sample rate) or after the counter exceeds a threshold. I need the counter value precisely at the moment I ask. This is why RDPMC seemed so attractive. I imagine that sampling frequently will itself skew the performance counter readings.
  • PAPI builds on perf, so probably inherits the above problem.
  • Write a kernel module -- too much effort, too error prone.

Ideally I would like a solution which works on OpenBSD and Linux, but somehow I think that is a tall order. Perhaps just for Linux for now.

Any help is most appreciated. Thanks.

EDIT: I just found the Linux msr device node, which would probably suffice. I'll leave the question up in case a better answer shows up.


回答1:


It seems the best way -- for Linux at least -- is to use the msr device node.

You simply open a device node, seek to the address of the MSR required, and read or write 8 bytes.

OpenBSD is harder, since (at the time of writing) there is no user-space proxy to the MSRs. So you would need to write a kernel module or implement a sysctl by hand.



来源:https://stackoverflow.com/questions/39021662/how-to-configure-and-sample-intel-performance-counters-in-process

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!