Yes, there's lots of such tools. MPI defines a tools interface that allows other libraries to interject themselves at your MPI function calls, and do counts, timing, etc.
A very small MPI profiling tool is mpiP - it gives a very short summary of MPI activity in your code.
The IPM library is fairly easy to build and gives you lots of MPI counts and times, and gives a nice HTML file as a result. You mention PAPI; IPM will also integrate PAPI counters if available. We use this regularly at our centre, and I think this would do what you like. If you've built your program with dynamic libraries for MPI, you don't even need to recompile to use this (mpiP has the same property).
Jumpshot, which comes with MPICH2 but can be built with any MPI, actually shows on a timeline how long each MPI operation took.
OpenSpeedshop gives very detailed performance measurements of your code, highlighting especially "expensive" lines; it also has an MPI-tracing mode which will identify MPI times by line of code. It can be tricky to install.
On the commercial part of the spectrum there are Vampir from TU Dresden and Intel Trace Analyzer and Collector (ITAC). Vampir collects source-level, MPI and OpenMP traces using the open source VampirTrace library that also integrates with PAPI to provide detailed event and counter tracing. VampirTrace's traces are in Open Trace Format that could be read by various other tools besides Vampir.
ITAC is part of Intel Cluster Studio XE. It is mostly designed to work with Intel MPI and sharing the same ancestral code with Vampir, provides more or less the same functionality. One of its nice features is the included automatic run-time MPI correctness checker.
Allinea MAP is an MPI profiler from Allinea that provides performance analysis with an integrated source browser that displays the communication/computation cost alongside individual lines of the source code. It also shows high-level graphs of performance information, including memory, CPU instructions and communication.
But there are other higher level tools which not only give reports, but actually offer advice. TACC's perfexpert is a command-line based tool which takes a number of measurements and offers some performance tuning advice. Scalasca out of Jülich, recompiles your code with a lot of source-level instrumentation and can point out load imbalances, particularly expensive MPI collectives, etc. It can also integrate with Vampir for detailed trace analysis.