Logging Memory Access Footprint

老子叫甜甜 提交于 2019-12-04 13:32:01
osgx

There is perf mem tool implemented for some modern x86/EM64T CPUs (probably, Intel-only; Ivy and newer desktop/server cpus). Man page of perf mem is http://man7.org/linux/man-pages/man1/perf-mem.1.html and same text in kernel docs dir: http://lxr.free-electrons.com/source/tools/perf/Documentation/perf-mem.txt. The text is incomplete; the best docs are sources: tools/perf/builtin-mem.c & partially in tools/perf/builtin-report.c. No details in https://perf.wiki.kernel.org/index.php/Tutorial.

Unlike qemu-mtrace it will not log every memory access, but only every Nth access where N is like 10000 or 100000. But it works with native speed and low overhead. Use perf mem record ./program to record pattern; try to add -a or -C cpulist for system-wide or global sampling for some CPU cores. There is no way to log (trace) all and every memory access from inside the system (tool should write info to memory and will log this access - this is infinite recursion with finite memory), but there are very costly proprietary system-specific external tracing solutions like JTAG or SDRAM sniffer ($5k or more).

The tools of perf mem where added around 2013 (3.10 version of linux kernel), there are several results of searching perf mem on lwn: https://lwn.net/Articles/531766/

With this patch, it is possible to sample (not trace) memory accesses (load, store). For loads, the instruction and data addresses are captured along with the latency and data source. For stores, the instruction and data addresses are capture along with limited cache and TLB information.

The current patches implement the feature on Intel processors starting with Nehalem. The patches leverage the PEBS Load Latency and Precise Store mechanisms. Precise Store is present only on Sandy Bridge and Ivy Bridge based processors.

Physical address sampling support added: https://lwn.net/Articles/555890/ (perf mem --phys-addr -t load rec); (there is also bit related 2016 year c2c perf tool "to track down cacheline contention": https://lwn.net/Articles/704125/ with examples https://joemario.github.io/blog/2016/09/01/c2c-blog/)

Some random slides on perf mem:

Some info on decoding perf mem -D report: perf mem -D report

 # PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL
 2054  2054 0xffffffff811186bf 0x016ffffe8fbffc804b0    49 0x68100842 /lib/modules/3.12.23/build/vmlinux:perf_event_aux_ctx

What does "ADDR", "DSRC", "SYMBOL" mean?

(answered by the same user as in this answer)

  • IP - PC of the load/store instruction;
  • SYMBOL - name of function, containing this instruction (IP);
  • ADDR - virtual memory address of data, requested by load/store (if there was no --phys-data option)
  • DSRC - "Decoded Source".

There is also sorting to get some basic stats: perf mem rep --sort=mem - http://thread.gmane.org/gmane.linux.kernel.perf.user/1438

Other tools.. There is (slow) cachegrind emulator based on valgrind for simulating cache memory for userspace prograns - "7.2 Simulating CPU Caches" of https://lwn.net/Articles/257209/. There should also be something for low-level (slowest) models related to DRAMsim/DRAMsim2 http://eng.umd.edu/~blj/dramsim/

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!