I am trying to do some bare-metal programming in ARM with GCC and testing on QEMU. Whenever I call into an ARM label from C, my program hangs. I have a simple example of code
In order to call an ARM mode function defined in assembly from a THUMB mode function defined in C, you need to define a symbol in assembly as a function, and the tools (Linaro gcc) will produce a blx
instruction instead of bl
.
Example:
@ Here, we suppose that this part of code is inside of .code 32
.type fn, %function
fn:
mov pc, lr
see http://github.com/dwelch67/yagbat qemu directory.
Here are a couple of examples of calling arm or thumb from arm
start_vector:
mov sp,#0x20000
;@ call an arm function from arm
bl notmain
;@ call a thumb function frm arm
ldr r0,=0xAABBAABB
bl hexstring_trampoline
;@ call a thumb function frm arm
ldr r0,=0x12341234
ldr r1,hexstring_addr
mov lr,pc
bx r1
;@ call a thumb function frm arm
ldr r0,=0x12312344
bl hexstring_trampoline
hang:
b hang
hexstring_trampoline:
ldr r1,hexstring_addr
bx r1
hexstring_addr: .word hexstring
If you look at the instruction set reference you will see that you need to use BX or BLX to switch between arm and thumb states. BLX is not as widely supported as BX.
From a definition standpoint the program counter, pc is two instructions ahead during execution of an instruction. for thumb that is 4 bytes, for arm 8 bytes. Either case two instructions. To simulate a bl which cant be used to change state, you need to load the link register with the return address, and use a bx to branch to the function changing state depending on the lsbit of the address. so the
mov lr,pc
bx r1
here:
the mov lr,pc above loads the address of here: which is our return address, bx r1 in a state independent manner calls the function. the lsbit of the lr address indicates the mode to return to and you need to always use bx to return
pre_thumb:
ldr pc,lr
thumb_capable:
bx lr
The compiler allocates a bl instruction for calling functions, the linker fills in the rest later, if it is too far of a reach then it needs a trampoline function which the linker is adding itself. Likewise if you need to change modes the bl calls a trampoline function that does that. I have modeled that in one of the above to mimic that, you can see it is a bit wasteful, hopefully my explanation of the compiler only allocating space for a bl makes that more clear, wasteful would be to always plan for a mode change and have to insert nops for the majority of the function calls in code.
The code also includes a call to arm from thumb in assembler:
.thumb
.thumb_func
.globl XPUT32
XPUT32:
push {lr}
;@ call an arm function from thumb asm
ldr r2,=PUT32
mov lr,pc
bx r2
pop {r2}
bx r2
mostly the same except you cannot pop to lr in thumb mode, you can pop to pc, but I dont think that switches modes, so you cant use it, you again need a spare register. You of course need to know the calling conventions to know what registers you can use or you can wrap another set of pushes and pops to preserve all but lr
push {r2,lr}
;@ call an arm function from thumb asm
ldr r2,=PUT32
mov lr,pc
bx r2
pop {r2}
mov lr,r2
pop {r2}
bx lr
Thumb to thumb or arm to arm you just use a bl if you can reach. ldr pc,address if you cant.
To eliminate the confusion:
The problem was that Ubuntu's GCC cross-compiler for ARM generates thumb (16-bit) instructions by default. As other answers here show, calling between the two is possible, but while the GNU assembler detected that the C code was generating thumb instructions and so happily generated shims using bx to set the mode correctly for calling into C, I have no control over what GCC itself generates for calling functions, and it was calling them with just bl, which broke because my assembly code needs to be ARM instructions (32-bit).
The solution (which is poorly documented) is to send gcc -marm, which will at least make all the code the same type.
If there is a switch to get gcc to generate bx calls for functions, that would probably work as well.
If you assemble your asm code as Thumb, you need to mark the function as a Thumb function, so that the linker uses correct instruction when branching to it (e.g. BLX or BX to an address with the low bit set). This is done with the .thumb_func directive:
.global activate
.thumb_func
activate:
b test
Another option is to explicitly ask the assembler to generate ARM code:
.code 32
.global activate
activate:
b test
Check this article too, although remember that current processors don't need many workarounds that were necessary in ARMv4, so you probably shouldn't follow it blindly.