问题
So I am programming in assemply, this is just a simple code so I can learn how to allocate arrays in order to use them on NEON programming later.
ASM_FUNC(FPE)
.data
.balign 8
array: .skip 80
array1: .word 10,20,30,40
.text
ldr x0,=array
mov x1,#10
check:
cmp x1,#1
bne loop
b exit
loop:
str x1,[x0],#8 //Stores the value in x1 into x0 and moves the address +8 bytes
sub x1,x1,#1 //x1--
b check
exit:
mov x0,#11
ret
So, some parts are commented so I could try to find where the code is breaking (I don't have debug on my system).
I started commenting the calculation part and added a mov x0,#11 in the end right before the ret to see if the problem was on the calculation. Turns out it was not.
When I uncommented the array: .skip 80 and ldr x0,=array my application would just stick there if no response.
Can anyone please tell me what I am doing wrong? I am using A64 on armv8 assembly
The entry point is called from this c program:
void PocAsm_EntryPoint ( )
{
Print(L"========== ASM ==========\n");
UINT32 fff = FPE();
Print(L" %d \n",fff);
Print(L"=========== ASM ===========\n");
Print(L"Test version 0.24 \n");
return 0;
}
Unfortunately I didn't find the definition of the Print, so I apologize
回答1:
This is an attempt to answer to the following question: does the FPE()
function work as expected, while removing everything else from the equation, using standard tools such as qemu-system-aarch64
and GDB
.
The code for the FPE()
function will be compiled for a Cortex-A53 qemu-virt machine.
Prerequisites:
- qemu-system-aarch64 is installed:
Ubuntu 20.04: sudo apt-get install qemu-system-arm
Windows 10: download and install the qemu-w64-setup-20201120.exe
installer from here.
- the
aarch64-none-elf
toolchain forCortex-A
is installed. It can be downloaded from the ARM WEB site. There are versions for both Linux and Windows 10.
FPE.s
:
.arch armv8-a
.file "FPE.s"
.data
.balign 8
.globl array
array: .skip 80
array1: .word 10,20,30,40
.text
.align 2
.globl FPE
FPE:
ldr x0,=array
mov x1,#10
check:
cmp x1,#1
bne loop
b exit
loop:
str x1,[x0],#8 //Stores the value in x1 into x0 and moves the address +8 bits
sub x1,x1,#1 //x1--
b check
exit:
mov x0,#11
ret
.end
startup.s
:
.title startup64.s
.arch armv8-a
.text
.section .text.startup,"ax"
.globl _start
_start:
ldr x0, =__StackTop
mov sp, x0
bl FPE
wait: wfe
b wait
.end
Building:
We will build FPE.elf
for the qemu-virt machine (RAM starts at 0x40000000
):
/opt/arm/9/gcc-arm-9.2-2019.12-x86_64-aarch64-none-elf/bin/aarch64-none-elf-gcc -nostdlib -nostartfiles -ffreestanding -g -Wl,--defsym,__StackTop=0x40010000 -Wl,--section-start=.text=0x40000000 -o FPE.elf startup.s FPE.s
Debugging:
Start qemu in a shell:
/opt/qemu-5.1.0/bin/qemu-system-aarch64 -semihosting -m 1M -nographic -serial telnet::4444,server,nowait -machine virt,gic-version=2,secure=on,virtualization=on -S -gdb tcp::1234,ipv4 -cpu cortex-a53 -kernel FPE.elf
Start GDB
:
opt/arm/9/gcc-arm-9.2-2019.12-x86_64-aarch64-none-elf/bin/aarch64-none-elf-gdb --quiet -nx -ex 'target remote localhost:1234' -ex 'load' --ex 'b _start' -ex 'b exit' FPE.elf
GDB
should start:
Reading symbols from FPE.elf...
Remote debugging using localhost:1234
_start () at startup.s:7
7 ldr x0, =__StackTop
Loading section .text, size 0x50 lma 0x40000000
Loading section .data, size 0x60 lma 0x40010050
Start address 0x40000000, load size 176
Transfer rate: 85 KB/sec, 88 bytes/write.
Breakpoint 1 at 0x40000000: file startup.s, line 7.
Breakpoint 2 at 0x40000040: file FPE.s, line 28.
From this point, the commands stepi
, p/x $x0
, and x/10g 0x40010050
could be used for monitoring the program behavior until it will reach the exit
label.
We will just here display the 10 elements in the array at the start and exit breakpoints:
gdb) x/10g 0x40010050
0x40010050: 0 0
0x40010060: 0 0
0x40010070: 0 0
0x40010080: 0 0
0x40010090: 0 0
(gdb) continue
Continuing.
Breakpoint 2, exit () at FPE.s:28
28 mov x0,#11
(gdb) x/10g 0x40010050
0x40010050: 10 9
0x40010060: 8 7
0x40010070: 6 5
0x40010080: 4 3
0x40010090: 2 0
Single-stepping from this point shows that the program returns properly from its execution:
(gdb) stepi
29 ret
(gdb) stepi
wait () at startup.s:10
10 wait: wfe
(gdb) stepi
11 b wait
(gdb) stepi
10 wait: wfe
The answer to the question would therefore be: Yes, the code for the FPE()
function is working properly.
The exact same procedure can be run on Windows 10, this is just a matter of adjusting the three commands that were used for running aarch64-none-elf-gcc
, qemu-system-aarch64
and GDB
.
Comparing a dump of your object file with the one I tested may help understanding the issue:
/opt.arm/9/gcc-arm-9.2-2019.12-x86_64-aarch64-none-elf/bin/aarch64-none-elf-as -o FPE.o FPE.s
/opt/arm/9/gcc-arm-9.2-2019.12-x86_64-aarch64-none-elf/bin/aarch64-none-elf-objdump -D FPE.o
FPE.o: file format elf64-littleaarch64
Disassembly of section .text:
0000000000000000 <FPE>:
0: 58000140 ldr x0, 28 <exit+0x8>
4: d2800141 mov x1, #0xa // #10
0000000000000008 <check>:
8: f100043f cmp x1, #0x1
c: 54000041 b.ne 14 <loop> // b.any
10: 14000004 b 20 <exit>
0000000000000014 <loop>:
14: f8008401 str x1, [x0], #8
18: d1000421 sub x1, x1, #0x1
1c: 17fffffb b 8 <check>
0000000000000020 <exit>:
20: d2800160 mov x0, #0xb // #11
24: d65f03c0 ret
...
Disassembly of section .data:
0000000000000000 <array>:
...
0000000000000050 <array1>:
50: 0000000a .inst 0x0000000a ; undefined
54: 00000014 .inst 0x00000014 ; undefined
58: 0000001e .inst 0x0000001e ; undefined
5c: 00000028 .inst 0x00000028 ; undefined
Dumping the complete ELF file of the minimal example would give:
opt/arm/9/gcc-arm-9.2-2019.12-x86_64-aarch64-none-elf/bin/aarch64-none-elf-objdump -D FPE.elf
FPE.elf: file format elf64-littleaarch64
Disassembly of section .text:
0000000040000000 <_start>:
40000000: 580000c0 ldr x0, 40000018 <wait+0xc>
40000004: 9100001f mov sp, x0
40000008: 94000006 bl 40000020 <FPE>
000000004000000c <wait>:
4000000c: d503205f wfe
40000010: 17ffffff b 4000000c <wait>
40000014: 00000000 .inst 0x00000000 ; undefined
40000018: 40010000 .inst 0x40010000 ; undefined
4000001c: 00000000 .inst 0x00000000 ; undefined
0000000040000020 <FPE>:
40000020: 58000140 ldr x0, 40000048 <exit+0x8>
40000024: d2800141 mov x1, #0xa // #10
0000000040000028 <check>:
40000028: f100043f cmp x1, #0x1
4000002c: 54000041 b.ne 40000034 <loop> // b.any
40000030: 14000004 b 40000040 <exit>
0000000040000034 <loop>:
40000034: f8008401 str x1, [x0], #8
40000038: d1000421 sub x1, x1, #0x1
4000003c: 17fffffb b 40000028 <check>
0000000040000040 <exit>:
40000040: d2800160 mov x0, #0xb // #11
40000044: d65f03c0 ret
40000048: 40010050 .inst 0x40010050 ; undefined
4000004c: 00000000 .inst 0x00000000 ; undefined
Disassembly of section .data:
0000000040010050 <__data_start>:
...
00000000400100a0 <array1>:
400100a0: 0000000a .inst 0x0000000a ; undefined
400100a4: 00000014 .inst 0x00000014 ; undefined
400100a8: 0000001e .inst 0x0000001e ; undefined
400100ac: 00000028 .inst 0x00000028 ; undefined
来源:https://stackoverflow.com/questions/64991510/execution-freezes-when-i-try-to-allocate-array-in-armv8-assembly