“Unexplainable” core dump

前端 未结 2 1593
囚心锁ツ
囚心锁ツ 2021-02-01 16:51

I\'ve seen many core dumps in my life, but this one has me stumped.

Context:

  • multi-threaded Linux/x86_64 program running on a cluster of AMD Barcelona CPUs
相关标签:
2条回答
  • 2021-02-01 17:32

    I've once seen an "illegal opcode" crash right in the middle of an instruction. I was working on a Linux port. Long story short, Linux subtracts from the instruction pointer in order to restart a syscall, and in my case this was happening twice (if two signals arrived at the same time).

    So that's one possible culprit: the kernel fiddling with your instruction pointer. There may be some other cause in your case.

    Bear in mind that sometimes the processor will understand the data it's processing as an instruction, even when it's not supposed to be. So the processor may have executed the "instruction" at 0x17bd9fa and then moved on to 0x17bd9fd and then generated an illegal opcode exception. (I just made that number up, but experimenting with a disassembler can show you where the processor might have "entered" the instruction stream.)

    Happy debugging!

    0 讨论(0)
  • 2021-02-01 17:40

    So, unlikely as it may seem, we appear to have hit an actual bona-fide CPU bug.

    http://support.amd.com/us/Processor_TechDocs/41322_10h_Rev_Gd.pdf has erratum #721:

    721 Processor May Incorrectly Update Stack Pointer

    Description

    Under a highly specific and detailed set of internal timing conditions,
    the processor may incorrectly update the stack pointer after a long series
    of push and/or near-call instructions, or a long series of pop 
    and/or near-return instructions. The processor must be in 64-bit mode for
    this erratum to occur.
    

    Potential Effect on System

    The stack pointer value jumps by a value of approximately 1024, either in
    the positive or negative direction.
    This incorrect stack pointer causes unpredictable program or system behavior,
    usually observed as a program exception or crash (for example, a #GP or #UD).
    
    0 讨论(0)
提交回复
热议问题