问题
Say if I am running an ARM simulator using Qemu, is it possible to find the time of execution of a program as it would be on the real ARM processor. In other words if I use functions such as gettimeofday
, in a program running on the simulator, to check the elapsed time, will the elapsed time be given accurately through the cycle-accurate simulation?
回答1:
Investigation in this issue at our company concluded that Qemu (for the ARM) is not cycle accurate. If I remember correctly cycle accuracy is not a goal of Qemu, instead it aims at fast emulation. Beware also that exact timing is dependent on quite unpredictable things like cache hits and misses. It will also depend on the actual architecture chosen. Note that ARM is merely an instruction set IP and several different implementations exist. If in addition an operating system is emulated, things get even more unpredictable.
We use the simulator from ARM to evaluate performance, but even that one is not fully cycle accurate for the latest versions of the ARM architecture.
回答2:
GEM5
I have seen a researcher use gem5 for this. This paper evaluates how accurate it is. And I've created an easy to get started setup on GitHub.
As Bryan mentioned QEMU is designed for speed: only a valid x86 API behavior must be reached, not necessarily with the right number of cycles or in the same pipeline order. This is also called functional emulation.
Furthermore, DRAM memory accesses are assumed to be immediate, and therefore it makes no sense to emulate caches either. And as we know, current CPUs are basically memory latency hiding machines.
Cycle accurate emulators on the other hand, also emulate CPU internals, and are therefore way slower.
The root of the problem is of course the under documented performance features of processors, which vendors don't release to prevent intellectual property leakage.
GEM5 appears to implement a generic version of common CPU internals, so it should be more cycle accurate than functional emulators, but true cycle accurate emulation is likely impossible without insider knowledge.
Third party emulation implementors must then reverse engineer CPU performance from experiments and existing documentation.
Some of the key "internals" are cache, pipeline and branch prediction.
Related:
- Question that asks how cycle accurate emulators are possible at all: How can CAS simulators like PTLsim achieve cycle accurate simulation of x86 hardware?
- ARM Cycle-Accurate Simulator
来源:https://stackoverflow.com/questions/17454955/can-you-check-performance-of-a-program-running-with-qemu-simulator