问题
This is my function prototype:
char* swap(char* array, int index1, int index2);
This is my assembly code:
segment .text
global swap
swap:
mov r14,[rdi+rsi]
mov r15,[rdi+rdx]
mov [rdi+rsi],r15 ;this line segfaults
mov [rdi+rdx],r14
mov rax,rdi
ret
The lines mov [rdi+rsi],r15
and mov [rdi+rdx],r14
give me a segfault; I'm not sure where I'm going wrong
The calling function:
#include <stdio.h>
#include <stdlib.h>
extern char* swapLetters(char* str, int indexA, int indexB);
int main()
{
char* st= "E X A M P L E";
printf("Before swap: \t%s\n", st);
char * res = swap(st, 2 ,10);
printf("After swap: \t%s\n", res);
return 0;
}
Expected output:
Before swap: E X A M P L E
After swap: E L A M P X E
回答1:
The primary problem is that your st
variable is defined as a pointer to a string literal.
char* st= "E X A M P L E";
String literals in the C language are considered read-only. To modify such a string is undefined behaviour. What happens is unknown and will be specific to the compiler and the environment it runs in. Your environment is raising an exception when you go to write that memory in the assembly code. On most modern OSes using modern compilers the string literals are placed in memory that isn't writeable so that it will generate an exception, which is what happened in your case.
If you wish to create a character array in writeable memory you can define st
this way:
char st[] = "E X A M P L E";
Issues with the Assembly Code
One issue is that your indices to the function swap
are int
. In 64-bit GCC/CLANG int
is 32-bits. If you pass 32-bit signed int
to the assembly code the top 32-bits may have garbage in them. Given that your indices are never negative you should use an unsigned type and preferably one that is 64-bit. I would recommend the size_t
type instead. size_t
will be unsigned and 64-bit in size in x86-64 code, so when passed to the assembly code you don't need to sign/zero extend the index values to 64-bits before using them. I'm recommending changing swap
to be:
char* swap(char* array, size_t index1, size_t index2)
If you keep index1
and index2
as signed integers (int
) the beginning of your assembly code would have to use MOVSX on both ESI and EDX registers. That code would look like:
swap:
movsx rsi, esi ; Sign extend 32-bit index1 parm in ESI to 64-bits
movsx rdx, edx ; Sign extend 32-bit index2 parm in EDX to 64-bits
; rest of function here
If you were to have used 32-bit unsigned int
for index
and index2
you would have had to zero extend the 32-bit values with:
mov esi, esi ; Zero extend 32-bit index1 parm in ESI to 64-bits
mov edx, edx ; Zero extend 32-bit index2 parm in EDX to 64-bits
; rest of function here
When the destination of an operation is a 32-bit register in 64-bit mode, the CPU automatically zeros the upper 32-bits of the destination register. Moving a 32-bit register like ESI to itself will clear the upper 32-bits of RSI. This is the same for all the general purpose registers.
RBX, RBP, and R12–R15 are non-volatile registers according to the x86-64 System V ABI. If your function modifies them their contents have to be preserved. You can push them on the stack and pop their original values off the stack when finished. The preferred way is to use one of the volatile registers that don't need to preserved like R8-R11, RAX, RCX, RDX, RDI, RSI.
When you move data to/from memory using a 64-bit register then 64 bits (8 bytes) will be transferred. As an example:
mov r14,[rdi+rsi]
Moves the 8 bytes starting at memory address [rdi+rsi]
and moves it to 64-bit register R14. The write later on does something similar but updates 8 bytes in memory rather than one byte. Updating 8 bytes of data could smash the stack if the array of characters were placed on the stack, which happens to be the case in your code and environment.
When using the numbered registers R8 to R15 you can reference the low 8 bits by placing a b
suffix on the end of the register name (w
is for 16-bit word, d
is for 32-bit double word). A complete chart of all the registers names in NASM/YASM syntax for 64-bit mode are:
mov r14,[rdi+rsi]
would be written as mov mov r14b,[rdi+rsi]
to move a single byte. You would have to make that change to each of the other moves as well.
Assuming you change index1
and index2
to have type size_t
(or uin64_t
) your assembly code could have been written as :
segment .text
global swap
swap:
push r14 ; Save non-volatile registers we overwrite
push r15
mov r14b,[rdi+rsi] ; Move one byte from [rdi+rsi] to R14B. R14B is lower 8 bits of R14
mov r15b,[rdi+rdx] ; Move one byte from [rdi+rdx] to R15B. R15B is lower 8 bits of R15
mov [rdi+rsi],r15b ; Move the byte in R15B to [rdi+rsi]
mov [rdi+rdx],r14b ; Move the byte in R14B to [rdi+rdx]
mov rax,rdi
pop r15 ; Restore non-volatile registers
pop r14
ret
If you were to use the other volatile registers rather than the non-volatile ones the code could have been simplified to:
segment .text
global swap
swap:
mov al,[rdi+rsi] ; Move one byte from [rdi+rsi] to AL. AL is lower 8 bits of RAX
mov cl,[rdi+rdx] ; Move one byte from [rdi+rdx] to CL. CL is lower 8 bits of RCX
mov [rdi+rsi],cl ; Move the byte in CL to [rdi+rsi]
mov [rdi+rdx],al ; Move the byte in AL to [rdi+rdx]
mov rax,rdi
ret
In this case we use the lower 8 bits of the volatile registers RAX(AL) and RCX(CL) to do the swap. Since we don't have to preserve these registers there is no need to save and restore them.
回答2:
Part of the problem here is that an area of non-writable memory is being used to write to, it will not work. (There are also other correctness problems with the asm, see @MichaelPetch's answer.)
When this is created:
char* st= "E X A M P L E";
Because it creates a string literal, the pointer st
refers to a memory location that is not writable.
If created as:
char st[] = "E X A M P L E";
st
stored in writable memory and its contents are the characters, instead of just holding a pointer to a read-only string literal.
来源:https://stackoverflow.com/questions/64268543/function-that-takes-a-char-array-and-2-indices-swapping-the-chars-in-those-indi