C is not always faster.
C is slower than, for example Modern Fortran.
C is often slower than Java for some things. ( Especially after the JIT compiler has had a go at your code)
C lets pointer aliasing happen, which means some good optimizations are not possible. Particularly when you have multiple execution units, this causes data fetch stalls. Ow.
The assumption that pointer arithmetic works really causes slow bloated performance on some CPU families (PIC particularly!) It used to suck the big one on segmented x86.
Basically, when you get a vector unit, or a parallelizing compiler, C stinks and modern Fortran runs faster.
C programmer tricks like thunking ( modifying the executable on the fly) cause CPU prefetch stalls.
You get the drift ?
And our good friend, the x86, executes an instruction set that these days bears little relationship to the actual CPU architecture. Shadow registers, load-store optimizers, all in the CPU. So C is then close to the virtual metal. The real metal, Intel don't let you see. (Historically VLIW CPU's were a bit of a bust so, maybe that's no so bad.)
If you program in C on a high-performance DSP (maybe a TI DSP ?), the compiler has to do some tricky stuff to unroll the C across the multiple parallel execution units. So in that case C isn't close to the metal, but it is close to the compiler, which will do whole program optimization. Weird.
And finally, some CPUs (www.ajile.com) run Java bytecodes in hardware. C would a PITA to use on that CPU.