assembly

Can't understand assembly mov instruction between register and a variable

南笙酒味 提交于 2021-02-10 00:31:28
问题 I am using NASM assembler on linux 64 bit. There is something with variables and registers I can't understand. I create a variable named "msg": msg db "hello, world" Now when I want to write to the stdout I move the msg to rsi register, however I don't understand the mov instruction bitwise ... the rsi register consists of 64 bit , while the msg variable has 12 symbols which is 8 bits each , which means the msg variable has a size of 12 * 8 bits , which is greater than 64 bits obviously. So

Instructions appended to end of assembly

孤者浪人 提交于 2021-02-09 09:21:46
问题 I am trying to follow this tutorial for creating a binary file, but the linker appears to be appending additional instructions at the end of the assembly. I assume this is the OS's tear-down process. The tutorial attempts to compile a bare bones 32-bit C program on Linux: int main() { } using these commands: gcc -c test.c ld -o test -Ttext 0x0 -e main test.o objcopy -R .note -R .comment -S -O binary test test.bin ndisasm -b 32 test.bin I am running 64-bit Linux, and hence modified the

Instructions appended to end of assembly

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-09 09:21:25
问题 I am trying to follow this tutorial for creating a binary file, but the linker appears to be appending additional instructions at the end of the assembly. I assume this is the OS's tear-down process. The tutorial attempts to compile a bare bones 32-bit C program on Linux: int main() { } using these commands: gcc -c test.c ld -o test -Ttext 0x0 -e main test.o objcopy -R .note -R .comment -S -O binary test test.bin ndisasm -b 32 test.bin I am running 64-bit Linux, and hence modified the

Instructions appended to end of assembly

吃可爱长大的小学妹 提交于 2021-02-09 09:20:15
问题 I am trying to follow this tutorial for creating a binary file, but the linker appears to be appending additional instructions at the end of the assembly. I assume this is the OS's tear-down process. The tutorial attempts to compile a bare bones 32-bit C program on Linux: int main() { } using these commands: gcc -c test.c ld -o test -Ttext 0x0 -e main test.o objcopy -R .note -R .comment -S -O binary test test.bin ndisasm -b 32 test.bin I am running 64-bit Linux, and hence modified the

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

拜拜、爱过 提交于 2021-02-09 04:37:06
问题 In the general case, how can an instruction that can take memory or register operands ever be slower with memory operands then mov + mov -> instruction -> mov + mov Based on the throughput and latency found in Agner Fog's instruction tables (looking at Skylake in my case, p238) I see that the following numbers for the btr/bts instructions: instruction, operands, uops fused domain, uops unfused domain, latency, throughput mov r,r 1 1 0-1 .25 mov m,r 1 2 2 1 mov r,m 1 1 2 .5 ... bts/btr r,r 1 1

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

安稳与你 提交于 2021-02-09 04:34:53
问题 In the general case, how can an instruction that can take memory or register operands ever be slower with memory operands then mov + mov -> instruction -> mov + mov Based on the throughput and latency found in Agner Fog's instruction tables (looking at Skylake in my case, p238) I see that the following numbers for the btr/bts instructions: instruction, operands, uops fused domain, uops unfused domain, latency, throughput mov r,r 1 1 0-1 .25 mov m,r 1 2 2 1 mov r,m 1 1 2 .5 ... bts/btr r,r 1 1

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

狂风中的少年 提交于 2021-02-09 04:33:48
问题 In the general case, how can an instruction that can take memory or register operands ever be slower with memory operands then mov + mov -> instruction -> mov + mov Based on the throughput and latency found in Agner Fog's instruction tables (looking at Skylake in my case, p238) I see that the following numbers for the btr/bts instructions: instruction, operands, uops fused domain, uops unfused domain, latency, throughput mov r,r 1 1 0-1 .25 mov m,r 1 2 2 1 mov r,m 1 1 2 .5 ... bts/btr r,r 1 1

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

大憨熊 提交于 2021-02-09 04:33:11
问题 In the general case, how can an instruction that can take memory or register operands ever be slower with memory operands then mov + mov -> instruction -> mov + mov Based on the throughput and latency found in Agner Fog's instruction tables (looking at Skylake in my case, p238) I see that the following numbers for the btr/bts instructions: instruction, operands, uops fused domain, uops unfused domain, latency, throughput mov r,r 1 1 0-1 .25 mov m,r 1 2 2 1 mov r,m 1 1 2 .5 ... bts/btr r,r 1 1

How can memory destination BTS be significantly slower than load / BTS reg,reg / store?

别说谁变了你拦得住时间么 提交于 2021-02-09 04:31:34
问题 In the general case, how can an instruction that can take memory or register operands ever be slower with memory operands then mov + mov -> instruction -> mov + mov Based on the throughput and latency found in Agner Fog's instruction tables (looking at Skylake in my case, p238) I see that the following numbers for the btr/bts instructions: instruction, operands, uops fused domain, uops unfused domain, latency, throughput mov r,r 1 1 0-1 .25 mov m,r 1 2 2 1 mov r,m 1 1 2 .5 ... bts/btr r,r 1 1

Performance optimisations of x86-64 assembly - Alignment and branch prediction

故事扮演 提交于 2021-02-08 19:50:37
问题 I’m currently coding highly optimised versions of some C99 standard library string functions, like strlen() , memset() , etc, using x86-64 assembly with SSE-2 instructions. So far I’ve managed to get excellent results in terms of performance, but I sometimes get weird behaviour when I try to optimise more. For instance, adding or even removing some simple instructions, or simply reorganising some local labels used with jumps completely degrades the overall performances. And there’s absolutely