x86 | 易学教程

How does CPU perform operation that manipulate data that's less than a word size

阅读更多关于 How does CPU perform operation that manipulate data that's less than a word size

问题 I had read that when CPU read from memory, it will read word size of memory at once (like 4 bytes or 8 bytes). How can CPU achieve something like: mov BYTE PTR [rbp-20], al where it copies only one byte of data from al to the stack. (given the data bus width is like 64 bit wide) Will be great if anyone can provide information on how it's implemented on the hardware level. And also, as we all know that when CPU execute program, it has program counter or instruction pointer that points to the

What is the avx2 instruction to store 8 integers?

阅读更多关于 What is the avx2 instruction to store 8 integers?

问题 I want to store the 8 integers from a __m256i variable to an array of 8 x 32 bit int s. I thought the instruction for that would be _mm256_store_epi32 , but I get an error that this instruction doesn't even exist! 回答1: Have a look at the Intel Intrinsics Guide. Depending on whether your destination is aligned, you need _mm256_store_si256 or _mm256_storeu_si256. 来源： https://stackoverflow.com/questions/43304021/what-is-the-avx2-instruction-to-store-8-integers

How to count matches using compare + je?

阅读更多关于 How to count matches using compare + je?

问题 I am writing a code that counts how many words are in a string. How can I increase a register using je? For example: cmp a[bx+1],00h je inc cx 回答1: je is a conditional jump . Unlike ARM, x86 can't directly predicate another single instruction based on an arbitrary condition. There's no single machine instruction that can do anything like je inc cx or ARM-style inceq cx . Instead you need to build the logic yourself by conditionally branching over other instruction(s). If you want to increase

How to count matches using compare + je?

阅读更多关于 How to count matches using compare + je?

Why is scanf returning 0.000000 when it is supplied with a double?

阅读更多关于 Why is scanf returning 0.000000 when it is supplied with a double?

问题 I have the following assembly code (written for NASM on Linux): ; This code has been generated by the 7Basic ; compiler <http://launchpad.net/7basic> extern printf extern scanf SECTION .data printf_f: db "%f",10,0 scanf_f: db "%f",0 SECTION .bss v_0 resb 8 SECTION .text global main main: push ebp mov ebp,esp push v_0 ; load the address of the variable push scanf_f ; push the format string call scanf ; call scanf() add esp,8 push dword [v_0+4] ; load the upper-half of the double push dword [v

Why is scanf returning 0.000000 when it is supplied with a double?

阅读更多关于 Why is scanf returning 0.000000 when it is supplied with a double?

Why is scanf returning 0.000000 when it is supplied with a double?

阅读更多关于 Why is scanf returning 0.000000 when it is supplied with a double?

What does it mean that “registers are preserved across function calls”?

阅读更多关于 What does it mean that “registers are preserved across function calls”?

问题 From this question, What registers are preserved through a linux x86-64 function call, it says that the following registers are saved across function calls: r12, r13, r14, r15, rbx, rsp, rbp So, I went ahead and did a test with the following: .globl _start _start: mov $5, %r12 mov $5, %r13 mov $5, %r14 mov $5, %r15 call get_array_size mov $60, %eax syscall get_array_size: mov $0, %r12 mov $0, %r13 mov $0, %r14 mov $0, %r15 ret And, I was thinking that after the call get_array_size that my

MMX Register Speed vs Stack for Unsigned Integer Storage

阅读更多关于 MMX Register Speed vs Stack for Unsigned Integer Storage

问题 I am contemplating an implementation of SHA3 in pure assembly. SHA3 has an internal state of 17 64 bit unsigned integers, but because of the transformations it uses, the best case could be achieved if I had 44 such integers available in the registers. Plus one scratch register possibly. In such a case, I would be able to do the entire transform in the registers. But this is unrealistic, and optimisation is possible all the way down to even just a few registers. Still, more is potentially

x86 NASM Indirect Far Jump In Real Mode

阅读更多关于 x86 NASM Indirect Far Jump In Real Mode

问题 I have been messing around with a multi-stage bootloader and I have got all of my code to work, except for the last part: The Jump . I have gotten this code to work out before now but I wanted to make it more modular by replacing this line: jmp 0x7E0:0 With this one: jmp far [Stage2Read + SectorReadParam.bufoff] Instead of hard coding where the code will load in, I wanted to do an indirect jump to it. Here's the rest of my code: ; This is stage 1 of a multi-stage bootloader bits 16 org 0x7C00