I could not find an objective study regarding ARC performance impact in a real life project. The official doc says
The compiler efficiently eliminates many
The cases where you possibly get a noticeable performance regression is where you send messages or call C/C++ functions having object parameters and where the number of instructions is relatively small per such function. The compiler will insert (and later not optimize away again) a retain/release pair for each parameter.
Considering the context, the compiler may recognize that certain retain/release pairs are unnecessary. However I've noticed, that even if a function is called which is declared static inline and which resides in the same translation unit as the caller, the compiler won't be able to optimize away unnecessary pairs of retain/release calls for the parameters.