I could not find an objective study regarding ARC performance impact in a real life project. The official doc says
The compiler efficiently eliminates many
I am afraid to talk something out of topic, and also based on guess...
I think most of performance gain from ARC is by inlining retain/release calls rather than eliding them.
Also, as far as I experienced, ARC usually introduces extra retain/release call. Because ARC is very strict and conservative, so it mostly doesn't perform retain/release elision. And many of newly inserted retain/release calls are semantically required, but omitted by programmer under MRC. (For example, all the passing-in function parameters and temporary variables)
So,
Count of calls to retain/release actually increased a lot to satisfy semantic completeness strictly.
A few of them will be elided by very conservative optimization.
Actual calls to retain/release will be inlined by optimization - by becoming static C function call from dynamic Objective-C method invoke - so calling cost itself will be reduced a lot.
As a result, we usually get decreased performance. I realized I was omitting a lot of retain/release calls before using ARC. But as you pointed, whatever we get, it's semantically complete, and still can be manually - so deterministic - elided by using __unsafe_unretained
.