I am working with a multithreaded embedded application. Each thread is allocated stack sizes based on its functionality. Recently we found that one of the thread corrupted the s
As Lee mentions, your best bet might be to port Electric Fence to your ARM9 proprietary compiler. Failing that, the ARM ABI and stack format is well documented, so you could write a CHECK_STACK function that verifies that the return addresses point to functions, etc.
However, it's hard to truly write some of these checks though unless you're the compiler, so if you're not particularly tied to this compiler, GCC does support ARM and it also supports stack guards.
What compiler are you using? I'm guessing a OS specific one. If you're using GCC, you may be able to use the Stack-Smashing Protector. This might be a fix for your production system prevent the issue, and would also allow you to detect it in development.
To effectively check for stack corruption, you need to check your available stack space, put guards on both sides of the stack arguments before the call, make the call, and then check the guards on the call's return. This kind of change generally requires modification to the code which the compiler generates.
Ideally valgrind would support your platform/OS. It's shocking to me that you don't get a separate vm memory region for each thread's stack. If there's any way to build your app so it can run on linux as well, you can probably reproduce the bug there and catch it with valgrind.
Do you have the kernel source? The last time I wrote a kernel, I added (as an option) stack checking in the kernel itself.
Whenever a context switch was going to occur, the kernel would check 2 stacks:
(1) The task being swapped out -->if the task blew its stack while it was running, let's know right now.
(2) The destination (target) task --> before we jump into the new task, let's make sure some wild code didn't clobber its stack. If its stack is corrupted, don't even switch into the task, we're screwed.
Theoretically the stacks of all tasks could be checked, but the above comments provide the rationale for why I checked these 2 stacks (configurable).
In addition to this, application code can monitor tasks (incl. the interrupt stack if you have one) in the idle loop, the tick ISR, etc...
When working on an embedded platform recently, I looked high and low for ways to do this (this was on an ARM7).
The suggested solution was what you've already come up with: initialize the stack with a known pattern and make sure that pattern exists after returning from a function. I thought the same thing "there's got to be a better way" and "hasn't someone automated this". The answer to both questions was "No" and I had to dig in just as you've done to try to find where the corruption was occuring.
I also "rolled my own" exception vectors for the data_abort, etc. There are some great examples on the 'net of how to backtrace the call stack. This is something you could do with a JTAG debugger, break when any of these abort vectors occurs and then investigate the stack. This can be useful if you only have 1 or 2 breakpoints (which seems to be the norm for ARM JTAG debugging).
ARM9 has JTAG/ETM debugging support on-die; you should be able to set up a data access watchpoint covering e.g. 64 bytes near the top of your stacks, which would then trigger a data abort, which you could catch in your program or externally.
(The hardware I work with only supports 2 read/write watchpoints, not sure if that's a limitation of the on-chip stuff or the surrounding third-party debug kit.)
This document, which is an extremely low-level description of how to interface with the JTAG functionality, suggests you read your processor's Technical Reference Manual -- and I can vouch that there's a decent amount of higher-level info in chapter 9 ("Debug Support") for the ARM946E-S r1p1 TRM.
Before you dig into understanding all this stuff (unless you're just doing it for fun/education), double-check that the hardware and software you're using won't already manage breakpoints/watchpoints for you. The concept of "watchpoint" was a bit hard to find in the debugging software we use -- it was a tab labelled "Hardware" in the add breakpoint dialog.
Another alternative: your compiler may support a command-line option to add function calls at the entry and exit points of functions (some sort of "void enterFunc(const char * callingFunc)" and "void exitFunc(const char * callingFunc)"), for function cost profiling, more accurate stack tracing, or similar. You can then write these functions to check your stack canary value.
(As an aside, in our case we actually ignore the function name that is passed in (I wish I could get the linker to strip these) and just use the processor's link register (LR) value to record where we came from. We use this for getting accurate call traces as well as profiling information; checking the stack canaries at this point would be trivial too!)
The problem is, of course, that calling these functions changes the register and stack profiles for the functions a bit... Not much, in our experiments, but a bit. The performance implications are worse, and wherever there's a performance implication there's the chance of a behavior change in the program, which may mean you e.g. avoid triggering a deep-recursion case that you might have before...
Very late update: these days, if you have a clang+LLVM based pipeline, you may be able to use Address Sanitizer (ASAN) to catch some of these. Be on the lookout for similar features in your compiler! (It's worth knowing about UBSAN and the other sanitizers too.)