I'm personally a fan of XHProf, one of Facebook's open source initiatives. This, along with the XDebug dumps, is crucial in determining performance bottlenecks. Plus, the UI (and particularly, the weighted image-based callgraph functionality) rocks.
I have used this across the Gawker Media network in the past (again, along with XDebug-style dumps), to help focus our performance-geared development efforts.