Luke Stackwalker seems promising -- it's not as polished as I'd like, but it is open source and it does do something that seems very close to what @Mike Dunlavey keeps saying we ought to do. (Of course, it then tries to smoosh it all down into the typically-unhelpful call graphs that Mike is so weary of, but it shouldn't be too hard to fix that with the source as our ally.)
It even seems to count time spent waiting in the kernel, as far as I can tell...