x86

In x86 assembly, when should I use global variables instead of local variables?

梦想的初衷 提交于 2021-02-08 03:39:55
问题 I am creating some small programs with x86 assembly, and it's my first time using a low level language so I'm not used to it. In high level languages I rarely use global variables, but I've seen a lot of tutorials using global variables in assembly, so I'm not sure when to use global variables instead of local variables. By global variables I mean data created in the .bss and .data segments, and by local variables I mean data allocated on the stack of the current procedure, using the stack

Force a migration of a cache line to another core

天涯浪子 提交于 2021-02-07 22:43:21
问题 In C++ (using any of the low level intrinsics available on the platform) for x86 hardware (say Intel Skylake for example), is it possible to send a cacheline to another core without forcing the thread on that core to load the line explicitly? My usecase is in a concurrent data-structure. In this, for some cases a core goes through some places in memory that might be owned by some other core(s) while probing for spots. The threads on those cores are typically are blocked on a condition

How does processors know the end of program?

血红的双手。 提交于 2021-02-07 19:54:53
问题 I was wondering, how does processors know when to stop executing a program. Or rather, when to stop the "fetch, decode execute" cycle. I have thought of different ways but not sure which is the correct one or if they are all wrong. 1- Maybe there is a special instruction at the end automatically added by the assembler to let the processor know this is the end. 2- When it reach an invalid memory (But how does it recognize that). 3- It loops and re-run the program, but again how does it

C++ latency increases when memory ordering is relaxed

情到浓时终转凉″ 提交于 2021-02-07 17:11:11
问题 I am on Windows 7 64-bit, VS2013 (x64 Release build) experimenting with memory orderings. I want to share access to a container using the fastest synchronization. I opted for atomic compare-and-swap. My program spawns two threads. A writer pushes to a vector and the reader detects this. Initially I didn't specify any memory ordering, so I assume it uses memory_order_seq_cst ? With memory_order_seq_cst the latency is 340-380 cycles per op. To try and improve performance I made stores use

NASM tutorial uses int 80h, but this isn't working on Windows

孤街醉人 提交于 2021-02-07 14:16:18
问题 I'm starting NASM Assembler after finishing FASM. I'm coding this in a Windows Operating System. My code reads: section.data ;Constant msg: db "Hello World!" msg_L: equ $-msg ; Current - msg1 section.bss ;Varialble section.text ; Code global _WinMain@16 _WinMain@16: mov eax,4 mov ebx,1; Where to wrte it out. Terminal mov ecx, msg mov edx, msg_L int 80h mov eax, 1 ; EXIT COMMAND mov ebx,0 ; No Eror int 80h To compile it and execute I use: nasm -f win32 test.asm -o test.o ld test.o -o test.exe

VEX prefixes encoding and SSE/AVX MOVUP(D/S) instructions

Deadly 提交于 2021-02-07 13:50:22
问题 I'm trying to understand the VEX prefix encoding for the SSE/AVX instructions. So please bear with me if I ask something simple. I have the following related questions. Let's take the MOVUP(D/S) instruction ( 0F 10 ). If I follow the 2-byte VEX prefix encoding correctly: The following two instruction encodings produce the same result: db 0fh, 10h, 00000000b ; movups xmm0,xmmword ptr [rax] db 0c5h, 11111000b, 10h, 00000000b ; vmovups xmm0,xmmword ptr [rax] As these two: db 066h, 0fh, 10h,

Getting the caller's Return Address

我们两清 提交于 2021-02-07 13:37:44
问题 I am trying to figure out how to grab the return address of a caller in MSVC. I can use _ReturnAddress() to get the return address of my function, but I can't seem to find a way to get the caller's. I've tried using CaptureStackBackTrace, but for some reason, it crashes after many, many calls. I would also prefer a solution via inline assembly. void my_function(){ cout << "return address of caller_function: " << [GET CALLER'S RETURN VALUE]; } // imaginary return address: 0x15AF7C0 void caller

Enable/Disable Hardware Lock Elision

蹲街弑〆低调 提交于 2021-02-07 13:34:13
问题 I am using glibc 2.24 version. It has lock elision path included for pthread_mutex_lock implementation with Transactional Synchronization Extensions such as _xbegin() and _xend(). The hardware is supposed to support lock elision as hle CPU flag is for Hardware Lock Elision I think. The processor I am using is Intel(R) Xeon(R) Gold 6130 with Skylake architecture. First I wanted to disable Lock elision but when I run the program that uses pthread_mutex_lock , with perf stat -T to monitor

Enable/Disable Hardware Lock Elision

杀马特。学长 韩版系。学妹 提交于 2021-02-07 13:33:24
问题 I am using glibc 2.24 version. It has lock elision path included for pthread_mutex_lock implementation with Transactional Synchronization Extensions such as _xbegin() and _xend(). The hardware is supposed to support lock elision as hle CPU flag is for Hardware Lock Elision I think. The processor I am using is Intel(R) Xeon(R) Gold 6130 with Skylake architecture. First I wanted to disable Lock elision but when I run the program that uses pthread_mutex_lock , with perf stat -T to monitor

Identifying faulting address on General Protection Fault (x86)

馋奶兔 提交于 2021-02-07 13:13:05
问题 I am trying to write a ISR for the General Protection Fault (GP#13) on x86. I am unable to figure out from the INTEL docs as to how I can find out the faulting address causing the exception. I know that for Page fault exceptions (GP#14) the cr2 register holds the faulting address. Any help is appreciated. 回答1: All references I make here are from AMD64 Architecture Programmer's Manual Volume 2: System Programming, which also describes the legacy protected-mode (i.e., x86) behavior. Figure 8-8