问题
UPDATE: I solved this problem with the help of Mark Tolonen's answer below. Here is the solution (but I'm puzzled by one thing):
I begin with the encoding string shown in Mark Tolonen's answer below (UTF-8):
CA_f1 = (ctypes.c_char_p * len(f1))(*(name.encode() for name in f1))
With optimizations off, I always store rcx into a memory variable on entry. Later in the program when I need to use the pointer in rcx, I read it from memory. That works for a single pointer, but doesn't work for accessing the pointer array Mark Tolonen showed below; maybe that's because it's a pointer array, not just a single pointer. It DOES work if I store rcx into r15 on entry, and downstream in the program it works like this:
;To access the first char of the first name pair:
xor rax,rax
mov rdx,qword[r15]
movsx eax,BYTE[rdx]
ret
;To access the second char of the second name pair:
mov rdx,qword[r15+8]
movsx eax,BYTE[rdx+1]
That's not a problem because I usually store as many variables as possible in registers; sometimes there are not enough registers, so I have to resort to storing some in memory. Now, when processing strings, I will always reserve r15 to hold the pointer passed in rcx if it's a pointer array.
Any insight into why the memory location doesn't work?
**** END OF ANSWER ****
I'm new to string processing in NASM, and I am passing a string from ctypes. The string data is read from a text file (Windows .txt), using the following Python function:
with open(fname, encoding = "utf8") as f1:
for item in f1:
item = item.lstrip()
item = item.rstrip()
return_data.append(item)
return return_data
The .txt file contains a list of first and last names, separated by newline-linefeed characters.
I pass a c_char_p pointer to a NASM dll using ctypes. The pointer is created with this:
CA_f1 = (ctypes.c_char_p * len(f1))()
Visual Studio confirms that it is a pointer to a byte string 50 NAMES long, which is where the problem may be, I need bytes, not list elements. Then I pass it using this ctypes syntax:
CallName.argtypes = [ctypes.POINTER(ctypes.c_char_p),ctypes.POINTER(ctypes.c_double),ctypes.POINTER(ctypes.c_double)]
UPDATE: before passing the string, now I convert the list to a string like this:
f1_x = ' '.join(f1)
Now VS shows a pointer to a 558 byte string, which is correct, but I still can't read a byte.
In my NASM program, I test it by reading a random byte into al using the following code:
lea rdi,[rel f1_ptr]
mov rbp,qword [rdi] ; Pointer
xor rax,rax
mov al,byte[rbp+1]
But the return value in rax is 0.
If I create a local string buffer like this:
name_array: db "Margaret Swanson"
I can read it this way:
mov rdi,name_array
xor rax,rax
mov al,[rdi]
But not from a pointer passed into a dll.
Here's the full code for a simple, reproducible example in NASM. Before passing it to NASM, I checked random bytes and they are what I expect, so I don't think it's encoding.
[BITS 64]
[default rel]
extern malloc, calloc, realloc, free
global Main_Entry_fn
export Main_Entry_fn
global FreeMem_fn
export FreeMem_fn
section .data align=16
f1_ptr: dq 0
f1_length: dq 0
f2_ptr: dq 0
f2_length: dq 0
data_master_ptr: dq 0
section .text
String_Test_fn:
;______
lea rdi,[rel f1_ptr]
mov rbp,qword [rdi]
xor rax,rax
mov al,byte[rbp+10]
ret
;__________
;Free the memory
FreeMem_fn:
sub rsp,40
call free
add rsp,40
ret
; __________
; Main Entry
Main_Entry_fn:
push rdi
push rbp
mov [f1_ptr],rcx
mov [f2_ptr],rdx
mov [data_master_ptr],r8
lea rdi,[data_master_ptr]
mov rbp,[rdi]
xor rcx,rcx
movsd xmm0,qword[rbp+rcx]
cvttsd2si rax,xmm0
mov [f1_length],rax
add rcx,8
movsd xmm0,qword[rbp+rcx]
cvttsd2si rax,xmm0
mov [f2_length],rax
add rcx,8
call String_Test_fn
pop rbp
pop rdi
ret
UPDATE 2:
In reply to a request, here is a ctypes wrapper to use:
def Read_Data():
Dir= "[FULL PATH TO DATA]"
fname1 = Dir + "Random Names.txt"
fname2 = Dir + "Random Phone Numbers.txt"
f1 = Trans_02_Data.StrDataRead(fname1)
f2 = Trans_02_Data.StrDataRead(fname2)
f2_Int = [ int(numeric_string) for numeric_string in f2]
StringTest_asm(f1, f2_Int)
def StringTest_asm(f1,f2):
f1.append("0")
f1_x = ' '.join(f1)
f1_x[0].encode(encoding='UTF-8',errors='strict')
Input_Length_Array = []
Input_Length_Array.append(len(f1))
Input_Length_Array.append(len(f2*8))
length_array_out = (ctypes.c_double * len(Input_Length_Array))(*Input_Length_Array)
CA_f1 = (ctypes.c_char_p * len(f1_x))() #due to SO research
CA_f2 = (ctypes.c_double * len(f2))(*f2)
hDLL = ctypes.WinDLL("C:/NASM_Test_Projects/StringTest/StringTest.dll")
CallName = hDLL.Main_Entry_fn
CallName.argtypes = [ctypes.POINTER(ctypes.c_char_p),ctypes.POINTER(ctypes.c_double),ctypes.POINTER(ctypes.c_double)]
CallName.restype = ctypes.c_int64
Free_Mem = hDLL.FreeMem_fn
Free_Mem.argtypes = [ctypes.POINTER(ctypes.c_double)]
Free_Mem.restype = ctypes.c_int64
start_time = timeit.default_timer()
ret_ptr = CallName(CA_f1,CA_f2,length_array_out)
abc = 1 #Check the value of the ret_ptr, should be non-zero
回答1:
Your name-reading code would return a list of Unicode strings. The following would encode a list of Unicode strings into an array of strings to be passed to a function taking a POINTER(c_char_p)
:
>>> import ctypes
>>> names = ['Mark','John','Craig']
>>> ca = (ctypes.c_char_p * len(names))(*(name.encode() for name in names))
>>> ca
<__main__.c_char_p_Array_3 object at 0x000001DB7CF5F6C8>
>>> ca[0]
b'Mark'
>>> ca[1]
b'John'
>>> ca[2]
b'Craig'
If ca
is passed to your function as the first parameter, the address of that array would be in rcx
per x64 calling convention. The following C code and its disassembly shows how the VS2017 Microsoft compiler reads it:
DLL code (test.c)
#define API __declspec(dllexport)
int API func(const char** instr)
{
return (instr[0][0] << 16) + (instr[1][0] << 8) + instr[2][0];
}
Disassembly (compiled optimized to keep short, my comments added)
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.00.24215.1
include listing.inc
INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES
PUBLIC func
; Function compile flags: /Ogtpy
; File c:\test.c
_TEXT SEGMENT
instr$ = 8
func PROC
; 5 : return (instr[0][0] << 16) + (instr[1][0] << 8) + instr[2][0];
00000 48 8b 51 08 mov rdx, QWORD PTR [rcx+8] ; address of 2nd string
00004 48 8b 01 mov rax, QWORD PTR [rcx] ; address of 1st string
00007 48 8b 49 10 mov rcx, QWORD PTR [rcx+16] ; address of 3rd string
0000b 44 0f be 02 movsx r8d, BYTE PTR [rdx] ; 1st char of 2nd string, r8d=4a
0000f 0f be 00 movsx eax, BYTE PTR [rax] ; 1st char of 1st string, eax=4d
00012 0f be 11 movsx edx, BYTE PTR [rcx] ; 1st char of 3rd string, edx=43
00015 c1 e0 08 shl eax, 8 ; eax=4d00
00018 41 03 c0 add eax, r8d ; eax=4d4a
0001b c1 e0 08 shl eax, 8 ; eax=4d4a00
0001e 03 c2 add eax, edx ; eax=4d4a43
; 6 : }
00020 c3 ret 0
func ENDP
_TEXT ENDS
END
Python code (test.py)
from ctypes import *
dll = CDLL('test')
dll.func.argtypes = POINTER(c_char_p),
dll.restype = c_int
names = ['Mark','John','Craig']
ca = (c_char_p * len(names))(*(name.encode() for name in names))
print(hex(dll.func(ca)))
Output:
0x4d4a43
That's the correct ASCII codes for 'M', 'J', and 'C'.
来源:https://stackoverflow.com/questions/54314682/python-ctypes-how-to-read-a-byte-from-a-character-array-passed-to-nasm