ARM64 using gas on iOS?

后端 未结 2 1559
粉色の甜心
粉色の甜心 2021-01-14 05:12

I\'ve got some assembly functions I\'ve ported to 64-bit ARM, and they work fine on Android, but when I tried to compile the same files in Xcode, I discovered that clang use

2条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-14 05:51

    Let's use my answer as a general guide to writing ARM64 code on Android and iOS. to begin, we'll start with the volatile and non-volatile registers (wikipedia):

    X0-X7 - arguments and return value (volatile)
    X8 = indirect result (struct) location (or temp reg)
    X9-X15 = temporary (volatile)
    X16-X17 - intro-call-use registers (PLT, Linker) or temp
    X18 - platform specific use (TLS)
    X19-X28 - callee saved registers (non-volatile)
    X29 - frame pointer
    X30 - link register (LR)
    SP - stack pointer and zero (XZR)
    V0-V7, V16-V31 - volatile NEON and FP registers
    V8-V15 - callee saved registers (non-volatile, used for temp vars by compilers)

    Next up is the assembler directives to correctly create the "segments" for your code:

    Android
    .cpu generic+fp+simd
    .text
    for each function, add these 3 lines
    .section .text.MyFunctionName,"ax",%progbits
    .align 2
    .type MyFunctionName, %function

    iOS (Nothing really needed except for the align directive)
    .align 2

    Declaring public (global) labels

    Android
    .global MyFunctionName

    iOS
    .globl _MyFunctionName <--notice the leading underscore and different spelling of the global directive

    The next difference is in getting a pointer to static data defined in your source code. For instance, let's say you have a data table and you would like to load register X0 with a pointer to the table.

    Android

      adrp  x0, MyDataTable
      add   x0, x0, #:lo12:MyDataTable
    

    iOS

      adrp  x0,MyDataTable@PAGE
      add   x0,x0,MyDataTable@PAGEOFF
    

    Next, NEON syntax. iOS allows the size information to be appended to the instruction mnemonic while Android wants to see the register with the size suffix

    Android
    ld1 {v0.16b},[x0],#16

    iOS
    ld1.16b {v0},[x0],#16

    Nested Loops
    In 32-bit ARM code it was typical to push LR on the stack to preserve it for when you need to call a function from within a function. Since NEON instructions are no longer in a co-processor and have been merged into the main instruction set of Aarch64, there's no penalty for moving data back and forth. It's now practical to preserve X30 (LR) in an unused NEON register. For example:

      fmov d0,x30   // preserve LR
      
      fmov x30,d0   // restore LR
    

    That's all for now. If someone finds specific cases where there are more differences, I'll add them.

提交回复
热议问题