Estimating of interrupt latency on the x86 CPUs

前端 未结 2 1631
借酒劲吻你
借酒劲吻你 2021-02-04 15:37

I looking for the info that can help in estimating interrupt latencies on x86 CPUs. The very usefull paper was found at \"datasheets.chipdb.org/Intel/x86/386/technote/2153.pdf\"

相关标签:
2条回答
  • 2021-02-04 16:03

    In general, there is no guaranteed upper bound on interrupt latency. Consider the following example:

    • Maskable interrupts are disabled by executing the sti instruction, which sets the IF flag.
    • A processor is transitioned to the C1 sleep state by executing the hlt instruction.
    • A maskable interrupt occurs whose affinity specifies that it can only be handled on that processor.

    In this case, the processor will not handle the interrupt until an unmaskable interrupt occurs to wake up the processor and the IF flag is cleared to enable handling maskable interrupts.

    The interrupt latency for any interrupt (including unmaskable interrupts) can be in the order of hundreds of microseconds if all the processors that are supposed to handle the interrupt are in a very deep sleep state. On my Haswell processor, the wakeup latency of the C7 state is 133 usecs. If this is an issue for you, you can use the Linux kernel parameter intel_idle.max_cstate (in case the intel_idle driver is used, which is the default on Intel processors) or processor.max_cstate (for the acpi_idle driver) to limit the deepest C-state. You can tell the kernel to never put any core to sleep using idle=poll, which may minimize the interrupt latency on an idle core, assuming of course that the frequency is not reduced due to thermal throttling. Using a polling loop also reduces the maximum turbo frequency of all cores, which may reduce overall performance of the system.

    On an active core (in state C0), a hardware interrupt is only accepted when the core is an interruptible state. This state occurs at instruction boundaries, except for string instructions, which are interruptible. Intel does not provide an upper bound on the number of instructions that are retired before a pending interrupt is accepted. A reasonable implementation may stop issuing uops into the ROB (at an instruction boundary) and wait until all uops in the ROB retire before beginning the execution of the microcode routine for invoking an interrupt handler. In such an implementation, the interrupt latency depends on the time it takes to retire all of the pending uops. High latency instructions such as loads, complex floating-point arithmetic, and locked instructions can easily make the interrupt latency in the order of hundreds of nanoseconds. However, if one of the pending uops requires a microcode assist for any reason (or some specific reasons), the processor may choose to flush the instruction and all later instructions, instead of invoking the assist. This implementation improves performance and power consumption at cost of increased interrupt latency.

    In another implementation tuned for minimizing interrupt latency, all in-flight instructions are immediately flushed without retiring anything. But all of these flushed instructions which went through the pipeline and some of which might have already been completed need to be fetched and go through the pipeline again when the interrupt handler returns. This results in reduced performance and increased power consumption.

    Hardware interrupts drain the store buffer and the write-combining buffers on Intel and AMD x86 processors. See: Interrupting an assembly instruction while it is operating.

    A paper from Intel titled Reducing Interrupt Latency Through the Use of Message Signaled Interrupts discusses a methodology to measure the latency of an interrupt from a PCIe device. This paper uses the term "interrupt latency" to mean the same thing as "interrupt response time" from the paper you mentioned. You need to somehow take a timestamp at the time the interrupt reaches the processor and then another timestamp at the very beginning of the interrupt handler. An approximation of the interrupt latency can be calculated by subtracting the two. The problem is of course getting the first timestamp (also in a way that is comparable to the second timestamp). The Intel paper proposes to use a PCIe analyzer, which consists of a PCIe device and an application that records all PCIe traffic with timestamps between the device and the CPU. They use a device driver to write to an MMIO location mapped to the device from the interrupt handler to create the second timestamp.

    0 讨论(0)
  • 2021-02-04 16:21

    If agner fog's optimization manuals (supplimented with the intel developer manuals) don't have anything, its unlikely anyone/anything else will(save for some internal intel/amd data): http://www.agner.org/optimize/

    0 讨论(0)
提交回复
热议问题