assembly | 易学教程

Can't understand assembly mov instruction between register and a variable

阅读更多关于 Can't understand assembly mov instruction between register and a variable

问题 I am using NASM assembler on linux 64 bit. There is something with variables and registers I can't understand. I create a variable named "msg": msg db "hello, world" Now when I want to write to the stdout I move the msg to rsi register, however I don't understand the mov instruction bitwise ... the rsi register consists of 64 bit , while the msg variable has 12 symbols which is 8 bits each , which means the msg variable has a size of 12 * 8 bits , which is greater than 64 bits obviously. So

Instructions appended to end of assembly

阅读更多关于 Instructions appended to end of assembly

问题 I am trying to follow this tutorial for creating a binary file, but the linker appears to be appending additional instructions at the end of the assembly. I assume this is the OS's tear-down process. The tutorial attempts to compile a bare bones 32-bit C program on Linux: int main() { } using these commands: gcc -c test.c ld -o test -Ttext 0x0 -e main test.o objcopy -R .note -R .comment -S -O binary test test.bin ndisasm -b 32 test.bin I am running 64-bit Linux, and hence modified the

Instructions appended to end of assembly

阅读更多关于 Instructions appended to end of assembly

Instructions appended to end of assembly

阅读更多关于 Instructions appended to end of assembly

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

阅读更多关于 How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

问题 In the general case, how can an instruction that can take memory or register operands ever be slower with memory operands then mov + mov -> instruction -> mov + mov Based on the throughput and latency found in Agner Fog's instruction tables (looking at Skylake in my case, p238) I see that the following numbers for the btr/bts instructions: instruction, operands, uops fused domain, uops unfused domain, latency, throughput mov r,r 1 1 0-1 .25 mov m,r 1 2 2 1 mov r,m 1 1 2 .5 ... bts/btr r,r 1 1

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

阅读更多关于 How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

阅读更多关于 How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

阅读更多关于 How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

阅读更多关于 How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

Performance optimisations of x86-64 assembly - Alignment and branch prediction

阅读更多关于 Performance optimisations of x86-64 assembly - Alignment and branch prediction

问题 I’m currently coding highly optimised versions of some C99 standard library string functions, like strlen() , memset() , etc, using x86-64 assembly with SSE-2 instructions. So far I’ve managed to get excellent results in terms of performance, but I sometimes get weird behaviour when I try to optimise more. For instance, adding or even removing some simple instructions, or simply reorganising some local labels used with jumps completely degrades the overall performances. And there’s absolutely