问题
I found mtrace by Dr.Clements. Although it is useful, it doesn't work normally in the situation I need. I intend to use the record to understand memory access pattern in different scenario.
Can someone share the related experience? Any suggestion will be appreciated.
0313 Updated: I'm trying to use qemu-mtrace to boot ubuntu 16.04 with linux-mtrace(3.8.0), but it only show several error message and terminated. Hope some tool be able to log every access.
$ ./qemu-system-x86_64 -mtrace-enable -mtrace-file mtrace.out -hda ubuntu.img -m 1024
Error: mtrace_entry_ascope (exit, syscall:xx) with no stack tag!
mtrace_entry_register: mtrace_host_addr failed (10)
mtrace_inst_exec: bad call 140734947607728
Aborted (core dumped)
回答1:
There is perf mem
tool implemented for some modern x86/EM64T CPUs (probably, Intel-only; Ivy and newer desktop/server cpus). Man page of perf mem
is http://man7.org/linux/man-pages/man1/perf-mem.1.html and same text in kernel docs dir: http://lxr.free-electrons.com/source/tools/perf/Documentation/perf-mem.txt. The text is incomplete; the best docs are sources: tools/perf/builtin-mem.c & partially in tools/perf/builtin-report.c. No details in https://perf.wiki.kernel.org/index.php/Tutorial.
Unlike qemu-mtrace
it will not log every memory access, but only every Nth access where N is like 10000 or 100000. But it works with native speed and low overhead. Use perf mem record ./program
to record pattern; try to add -a
or -C cpulist
for system-wide or global sampling for some CPU cores. There is no way to log (trace) all and every memory access from inside the system (tool should write info to memory and will log this access - this is infinite recursion with finite memory), but there are very costly proprietary system-specific external tracing solutions like JTAG or SDRAM sniffer ($5k or more).
The tools of perf mem
where added around 2013 (3.10 version of linux kernel), there are several results of searching perf mem on lwn: https://lwn.net/Articles/531766/
With this patch, it is possible to sample (not trace) memory accesses (load, store). For loads, the instruction and data addresses are captured along with the latency and data source. For stores, the instruction and data addresses are capture along with limited cache and TLB information.
The current patches implement the feature on Intel processors starting with Nehalem. The patches leverage the PEBS Load Latency and Precise Store mechanisms. Precise Store is present only on Sandy Bridge and Ivy Bridge based processors.
Physical address sampling support added: https://lwn.net/Articles/555890/ (perf mem --phys-addr -t load rec
); (there is also bit related 2016 year c2c
perf tool "to track down cacheline contention": https://lwn.net/Articles/704125/ with examples https://joemario.github.io/blog/2016/09/01/c2c-blog/)
Some random slides on perf mem
:
- http://indico.cern.ch/event/280897/contributions/1628882/attachments/515361/711133/SE-CERN_PMU_workshop_2013.pdf#page=4
- http://www.linuxtag.org/2013/fileadmin/www.linuxtag.org/slides/Arnaldo_Melo_-_Linux__perf__tools__Overview_and_Current_Developments.e323.pdf#page=10
- https://people.netfilter.org/pablo/netdev0.1/slides/sowa-perf-analytics.pdf#page=19
Some info on decoding perf mem -D report
: perf mem -D report
# PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL 2054 2054 0xffffffff811186bf 0x016ffffe8fbffc804b0 49 0x68100842 /lib/modules/3.12.23/build/vmlinux:perf_event_aux_ctx
What does "ADDR", "DSRC", "SYMBOL" mean?
(answered by the same user as in this answer)
- IP - PC of the load/store instruction;
- SYMBOL - name of function, containing this instruction (IP);
- ADDR - virtual memory address of data, requested by load/store (if there was no --phys-data option)
- DSRC - "Decoded Source".
There is also sorting to get some basic stats: perf mem rep --sort=mem
- http://thread.gmane.org/gmane.linux.kernel.perf.user/1438
Other tools.. There is (slow) cachegrind emulator based on valgrind for simulating cache memory for userspace prograns - "7.2 Simulating CPU Caches" of https://lwn.net/Articles/257209/. There should also be something for low-level (slowest) models related to DRAMsim/DRAMsim2 http://eng.umd.edu/~blj/dramsim/
来源:https://stackoverflow.com/questions/42748490/logging-memory-access-footprint