I want my exception handlers and debug functions to be able to print call stack backtraces, basically just like the backtrace() library function in glibc. Unfortunately, my C li
Since ARM platforms do not use a frame pointer, you never quite know how big the stackframe is and cannot simply roll out the stack beyond the single return value in R14.
When investigating a crash for which we do not have debug symbols, we simply dump the whole stack and lookup the closest symbol to each item in the instruction range. It does generate a load of false positives but can still be very useful for investigating crashes.
If you are running pure ELF executables, you can separate debug symbols out of your release executable. gdb can then help you find out what is going on from your standard unix core dump
Some compilers, like GCC optimize function calls like you mentioned in the example. For the operation of the code fragment, it is not needed to store the intermediate return pointers in the call chain. It's perfectly OK to return from func3()
to main()
, as the intermediate functions don't do anything extra besides calling another function.
It's not the same as code elimination (actually the intermediate functions could be completely optimized out), and a separate compiler parameter might control this kind of optimisation.
If you use GCC, try -fno-optimize-sibling-calls
Another handy GCC option is -mno-sched-prolog
, which prevents instruction reordering in the function prologue, which is vital, if you want to parse the code byte-by-byte, like it is done here:
http://www.kegel.com/stackcheck/checkstack-pl.txt
This is hacky, but I've found it works good enough considering the amount of code/RAM space required:
Assuming you're using ARM THUMB mode, compile with the following options:
-mtpcs-frame -mtpcs-leaf-frame -fno-omit-frame-pointer
The following function is used to retrieve the callstack. Refer to the comments for more info:
/*
* This should be compiled with:
* -mtpcs-frame -mtpcs-leaf-frame -fno-omit-frame-pointer
*
* With these options, the Stack pointer is automatically pushed to the stack
* at the beginning of each function.
*
* This function basically iterates through the current stack finding the following combination of values:
* - <Frame Address>
* - <Link Address>
*
* This combination will occur for each function in the call stack
*/
static void backtrace(uint32_t *caller_list, const uint32_t *caller_list_end, const uint32_t *stack_pointer)
{
uint32_t previous_frame_address = (uint32_t)stack_pointer;
uint32_t stack_entry_counter = 0;
// be sure to clear the caller_list buffer
memset(caller_list, 0, caller_list_end-caller_list);
// loop until the buffer is full
while(caller_list < caller_list_end)
{
// Attempt to obtain next stack pointer
// The link address should come immediately after
const uint32_t possible_frame_address = *stack_pointer;
const uint32_t possible_link_address = *(stack_pointer+1);
// Have we searched past the allowable size of a given stack?
if(stack_entry_counter > PLATFORM_MAX_STACK_SIZE/4)
{
// yes, so just quite
break;
}
// Next check that the frame addresss (i.e. stack pointer for the function)
// and Link address are within an acceptable range
else if((possible_frame_address > previous_frame_address) &&
((possible_frame_address < previous_frame_address + PLATFORM_MAX_STACK_SIZE)) &&
((possible_link_address & 0x01) != 0) && // in THUMB mode the address will be odd
(possible_link_address > PLATFORM_CODE_SPACE_START_ADDRESS &&
possible_link_address < PLATFORM_CODE_SPACE_END_ADDRESS))
{
// We found two acceptable values
// Store the link address
*caller_list++ = possible_link_address;
// Update the book-keeping registers for the next search
previous_frame_address = possible_frame_address;
stack_pointer = (uint32_t*)(possible_frame_address + 4);
stack_entry_counter = 0;
}
else
{
// Keep iterating through the stack until be find an acceptable combination
++stack_pointer;
++stack_entry_counter;
}
}
}
You'll need to update #defines for your platform.
Then call the following to populate a buffer with the current call stack:
uint32_t callers[8];
uint32_t sp_reg;
__ASM volatile ("mov %0, sp" : "=r" (sp_reg) );
backtrace(callers, &callers[8], (uint32_t*)sp_reg);
Again, this is rather hacky, but I've found it to work quite well. The buffer will be populated with link addresses of each function call in the call stack.
Does your executable contain debugging information, from compiling with the -g
option? I think this is required to get a full stack trace without a frame pointer.
You might need -gdwarf-2
to make sure it uses a format that includes unwind information.
For this you need -funwind-tables
or -fasynchronous-unwind-tables
In some targets this is required in order for _Unwind_Backtrace
work properly!
gcc does return optimization. In func1() and func2() it does not call func2()/func3() - instead of this, it jumps to func2()/func3(), so func3() can return immediately to main().
In your case, func1() and func2() do not need to setup a stack frame, but if they would do (e.g. for local variables), gcc still can do the optimization if the function call is the last instruction - it then cleans up the stack before the jump to func3().
Have a look at the generated assembler code to see it.
Edit/Update:
To verify that this is the reason, do something after the function call, that cannot be reordered by the compiler (e.g. using a return value). Or just try compiling with -O0.