I\'ve got some assembly functions I\'ve ported to 64-bit ARM, and they work fine on Android, but when I tried to compile the same files in Xcode, I discovered that clang use
You are in luck. The Libav team supports a tool that accepts gas syntax and outputs assembler for Apple's assembler. You can find the tool here: https://github.com/libav/gas-preprocessor/blob/master/gas-preprocessor.pl
Let's use my answer as a general guide to writing ARM64 code on Android and iOS. to begin, we'll start with the volatile and non-volatile registers (wikipedia):
X0-X7 - arguments and return value (volatile)
X8 = indirect result (struct) location (or temp reg)
X9-X15 = temporary (volatile)
X16-X17 - intro-call-use registers (PLT, Linker) or temp
X18 - platform specific use (TLS)
X19-X28 - callee saved registers (non-volatile)
X29 - frame pointer
X30 - link register (LR)
SP - stack pointer and zero (XZR)
V0-V7, V16-V31 - volatile NEON and FP registers
V8-V15 - callee saved registers (non-volatile, used for temp vars by compilers)
Next up is the assembler directives to correctly create the "segments" for your code:
Android
.cpu generic+fp+simd
.text
for each function, add these 3 lines
.section .text.MyFunctionName,"ax",%progbits
.align 2
.type MyFunctionName, %function
iOS (Nothing really needed except for the align directive)
.align 2
Declaring public (global) labels
Android
.global MyFunctionName
iOS
.globl _MyFunctionName <--notice the leading underscore and different spelling of the global directive
The next difference is in getting a pointer to static data defined in your source code. For instance, let's say you have a data table and you would like to load register X0 with a pointer to the table.
Android
adrp x0, MyDataTable
add x0, x0, #:lo12:MyDataTable
iOS
adrp x0,MyDataTable@PAGE
add x0,x0,MyDataTable@PAGEOFF
Next, NEON syntax. iOS allows the size information to be appended to the instruction mnemonic while Android wants to see the register with the size suffix
Android
ld1 {v0.16b},[x0],#16
iOS
ld1.16b {v0},[x0],#16
Nested Loops
In 32-bit ARM code it was typical to push LR on the stack to preserve it for when you need to call a function from within a function. Since NEON instructions are no longer in a co-processor and have been merged into the main instruction set of Aarch64, there's no penalty for moving data back and forth. It's now practical to preserve X30 (LR) in an unused NEON register. For example:
fmov d0,x30 // preserve LR
<some code which makes function calls>
fmov x30,d0 // restore LR
That's all for now. If someone finds specific cases where there are more differences, I'll add them.