x86 | 易学教程

Building backward compatible binaries with newer CPU instructions support

阅读更多关于 Building backward compatible binaries with newer CPU instructions support

问题 What is the best way to implement multiple versions of the same function that uses a specific CPU instructions if available (tested at run time), or falls back to a slower implementation if not? For example, x86 BMI2 provides a very useful PDEP instruction. How would I write a C code such that it tests BMI2 availability of the executing CPU on startup, and uses one of the two implementations -- one that uses _pdep_u64 call (available with -mbmi2 ), and another that does bit manipulation "by

What is “Code” in Linux Kernel crash messages?

阅读更多关于 What is “Code” in Linux Kernel crash messages?

问题 I have the following stack trace and crash information after the Linux kernel failed to load: [ 3.684670] ------------[ cut here ]------------ [ 3.695507] Bad FPU state detected at fpu__clear+0x91/0xc2, reinitializing FPU registers. [ 3.695508] traps: No user code available. [ 3.704745] invalid opcode: 0000 [#1] PREEMPT [ 3.715304] CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.50-android-x86-geeb7e76-dirty #1 [ 3.724594] Hardware name: AAEON UP-APL01/UP-APL01, BIOS UPA1AM21 09/01/2017 [ 3

What is “Code” in Linux Kernel crash messages?

阅读更多关于 What is “Code” in Linux Kernel crash messages?

What is “Code” in Linux Kernel crash messages?

阅读更多关于 What is “Code” in Linux Kernel crash messages?

How does “+&r” differ from “+r”?

阅读更多关于 How does “+&r” differ from “+r”?

问题 GCC's inline assembler recognizes the declarators =r and =&r . These make sense to me: the =r lets the assembler reuse an input register for output. However, GCC's inline assembler also recognizes the declarators +r and +&r . These make less sense to me. After all, isn't the distinction between +r and +&r a distinction without a difference? Does the +r alone not suffice to tell the compiler to reserve a register for the sole use of a single variable? For example, what is wrong with the

MFENCE/SFENCE/etc “serialize memory but not instruction execution”?

阅读更多关于 MFENCE/SFENCE/etc “serialize memory but not instruction execution”?

问题 Intel's System Programming Guide, section 8.3, states regarding MFENCE/SFENCE/LFENCE: "The following instructions are memory-ordering instructions, not serializing instructions. These drain the data memory subsystem. They do not serialize the instruction execution stream. " I'm trying to figure out why this matters. In multi-threaded code, writes/reads to memory are exactly what need to happen in a well-defined order. Of course, the order which I/O happens in might matter, but I/O

Which is generally faster to test for zero in x86 ASM: “TEST EAX, EAX” versus “TEST AL, AL”?

阅读更多关于 Which is generally faster to test for zero in x86 ASM: “TEST EAX, EAX” versus “TEST AL, AL”?

问题 Which is generally faster to test the byte in AL for zero / non-zero? TEST EAX, EAX TEST AL, AL Assume a previous "MOVZX EAX, BYTE PTR [ESP+4]" instruction loaded a byte parameter with zero-extension to the remainder of EAX, preventing the combine-value penalty that I already know about. So AL=EAX and there are no partial-register penalties for reading EAX. Intuitively just examining AL might let you think it's faster, but I'm betting there are more penalty issues to consider for byte access

Which is generally faster to test for zero in x86 ASM: “TEST EAX, EAX” versus “TEST AL, AL”?

阅读更多关于 Which is generally faster to test for zero in x86 ASM: “TEST EAX, EAX” versus “TEST AL, AL”?

How can I determine what architectures gcc supports?

阅读更多关于 How can I determine what architectures gcc supports?

问题 GCC supports a -march switch that allows you to specify the architecture you are targeting - allowing it to tune instruction sequences for that platform as well as using instructions that might be available on the platform which aren't available on the "default" or base version of the architecture. For example, -march=skylake will tell the compiler to target Skylake CPUs, including using instruction sets available on Skylake such as AVX2. How can I tell what values for -march the local

How can I determine what architectures gcc supports?

阅读更多关于 How can I determine what architectures gcc supports?