x86 | 易学教程

In x86 assembly, when should I use global variables instead of local variables?

阅读更多关于 In x86 assembly, when should I use global variables instead of local variables?

问题 I am creating some small programs with x86 assembly, and it's my first time using a low level language so I'm not used to it. In high level languages I rarely use global variables, but I've seen a lot of tutorials using global variables in assembly, so I'm not sure when to use global variables instead of local variables. By global variables I mean data created in the .bss and .data segments, and by local variables I mean data allocated on the stack of the current procedure, using the stack

Force a migration of a cache line to another core

阅读更多关于 Force a migration of a cache line to another core

问题 In C++ (using any of the low level intrinsics available on the platform) for x86 hardware (say Intel Skylake for example), is it possible to send a cacheline to another core without forcing the thread on that core to load the line explicitly? My usecase is in a concurrent data-structure. In this, for some cases a core goes through some places in memory that might be owned by some other core(s) while probing for spots. The threads on those cores are typically are blocked on a condition

How does processors know the end of program?

阅读更多关于 How does processors know the end of program?

问题 I was wondering, how does processors know when to stop executing a program. Or rather, when to stop the "fetch, decode execute" cycle. I have thought of different ways but not sure which is the correct one or if they are all wrong. 1- Maybe there is a special instruction at the end automatically added by the assembler to let the processor know this is the end. 2- When it reach an invalid memory (But how does it recognize that). 3- It loops and re-run the program, but again how does it

C++ latency increases when memory ordering is relaxed

阅读更多关于 C++ latency increases when memory ordering is relaxed

问题 I am on Windows 7 64-bit, VS2013 (x64 Release build) experimenting with memory orderings. I want to share access to a container using the fastest synchronization. I opted for atomic compare-and-swap. My program spawns two threads. A writer pushes to a vector and the reader detects this. Initially I didn't specify any memory ordering, so I assume it uses memory_order_seq_cst ? With memory_order_seq_cst the latency is 340-380 cycles per op. To try and improve performance I made stores use

NASM tutorial uses int 80h, but this isn't working on Windows

阅读更多关于 NASM tutorial uses int 80h, but this isn't working on Windows

问题 I'm starting NASM Assembler after finishing FASM. I'm coding this in a Windows Operating System. My code reads: section.data ;Constant msg: db "Hello World!" msg_L: equ $-msg ; Current - msg1 section.bss ;Varialble section.text ; Code global _WinMain@16 _WinMain@16: mov eax,4 mov ebx,1; Where to wrte it out. Terminal mov ecx, msg mov edx, msg_L int 80h mov eax, 1 ; EXIT COMMAND mov ebx,0 ; No Eror int 80h To compile it and execute I use: nasm -f win32 test.asm -o test.o ld test.o -o test.exe

VEX prefixes encoding and SSE/AVX MOVUP(D/S) instructions

阅读更多关于 VEX prefixes encoding and SSE/AVX MOVUP(D/S) instructions

问题 I'm trying to understand the VEX prefix encoding for the SSE/AVX instructions. So please bear with me if I ask something simple. I have the following related questions. Let's take the MOVUP(D/S) instruction ( 0F 10 ). If I follow the 2-byte VEX prefix encoding correctly: The following two instruction encodings produce the same result: db 0fh, 10h, 00000000b ; movups xmm0,xmmword ptr [rax] db 0c5h, 11111000b, 10h, 00000000b ; vmovups xmm0,xmmword ptr [rax] As these two: db 066h, 0fh, 10h,

Getting the caller's Return Address

阅读更多关于 Getting the caller's Return Address

问题 I am trying to figure out how to grab the return address of a caller in MSVC. I can use _ReturnAddress() to get the return address of my function, but I can't seem to find a way to get the caller's. I've tried using CaptureStackBackTrace, but for some reason, it crashes after many, many calls. I would also prefer a solution via inline assembly. void my_function(){ cout << "return address of caller_function: " << [GET CALLER'S RETURN VALUE]; } // imaginary return address: 0x15AF7C0 void caller

Enable/Disable Hardware Lock Elision

阅读更多关于 Enable/Disable Hardware Lock Elision

问题 I am using glibc 2.24 version. It has lock elision path included for pthread_mutex_lock implementation with Transactional Synchronization Extensions such as _xbegin() and _xend(). The hardware is supposed to support lock elision as hle CPU flag is for Hardware Lock Elision I think. The processor I am using is Intel(R) Xeon(R) Gold 6130 with Skylake architecture. First I wanted to disable Lock elision but when I run the program that uses pthread_mutex_lock , with perf stat -T to monitor

Enable/Disable Hardware Lock Elision

阅读更多关于 Enable/Disable Hardware Lock Elision

Identifying faulting address on General Protection Fault (x86)

阅读更多关于 Identifying faulting address on General Protection Fault (x86)

问题 I am trying to write a ISR for the General Protection Fault (GP#13) on x86. I am unable to figure out from the INTEL docs as to how I can find out the faulting address causing the exception. I know that for Page fault exceptions (GP#14) the cr2 register holds the faulting address. Any help is appreciated. 回答1: All references I make here are from AMD64 Architecture Programmer's Manual Volume 2: System Programming, which also describes the legacy protected-mode (i.e., x86) behavior. Figure 8-8