I am working with a multithreaded embedded application. Each thread is allocated stack sizes based on its functionality. Recently we found that one of the thread corrupted the s
When working on an embedded platform recently, I looked high and low for ways to do this (this was on an ARM7).
The suggested solution was what you've already come up with: initialize the stack with a known pattern and make sure that pattern exists after returning from a function. I thought the same thing "there's got to be a better way" and "hasn't someone automated this". The answer to both questions was "No" and I had to dig in just as you've done to try to find where the corruption was occuring.
I also "rolled my own" exception vectors for the data_abort, etc. There are some great examples on the 'net of how to backtrace the call stack. This is something you could do with a JTAG debugger, break when any of these abort vectors occurs and then investigate the stack. This can be useful if you only have 1 or 2 breakpoints (which seems to be the norm for ARM JTAG debugging).