I could not find an objective study regarding ARC performance impact in a real life project. The official doc says
The compiler efficiently eliminates many
Of course temporary variables are strong be default. That's explicit and clearly documented. And, if you think things through, it's what people will typically want.
MyWidget *widget=[[MyWidget alloc]init]; // 1
[myWidget mill]; // 2
If widget isn't strong, the new MyWidget will be created in line 1 and can be released and zeroed before line 2!
Now, it's certainly true that if you use lots of temporary variables -- for example, if you are rigorously obeying the Law of Demeter -- in the middle of a tight loop, and if you're assuming that those temporary variables have no performance cost at all because the world has plenty of registers, then you're going to be surprised.
And that might be the corner you're inhabiting right now.
But that's an exotic and special place! Most code isn't in the middle of a tight loop. Most tight loops aren't performance bottlenecks. And most tight loops don't need lots of intermediate variables.
Conversely, ARC can do the autorelease optimization in ways that you can't do manually (though perhaps the optimizer can). So, if there's a function returning an autoreleased variable in your tight loop, you may be better off with ARC.
Premature optimization is a bad idea. You may be in an inescapable performance corner, but most people are not. I spend most of my time in OS X, to be sure, but it's been years since I've had a performance issue where the answer wasn't a better algorithm.
(Finally, if ARC is causing a 70% performance hit to your application, then you're doing an awful lot of memory management on your critical path! Think about that: you're spending 70% of your time allocating and releasing objects. This sounds like a textbook case for something like a Flyweight or an object cache or a recycling pool!)
Here are my ARC vs MRC performance measurements. The performance test project is available on github so you can add your own tests. Just be sure to run it on a device. The results in the Simulator are skewed, and often in favor of MRC.
To summarize:
ARC and MRC are generally the same speed. In general code should be faster under ARC, but tight loops can be slower and significantly so.
In low-level tests ARC has an edge over MRC speed-wise, due to optimizations (autorelease returns, @autoreleasepool).
There is some code where ARC inserts additional retain/release that would not strictly be necessary under MRC, as long as the app is single-threaded. Such code may be slower under ARC, though it only makes a difference in tight loops and depends a lot on the code in question.
For example a method receiving an object should retain it even under MRC, because it might be released in a multithreaded application while the method is running. The fact that you can omit that code in MRC makes it faster, but inherently unsafer (although you'll rarely run into such an issue, OTOH if you do you wish you hadn't). Example:
-(void) someMethod:(id)object
{
[object retain]; // inserted by ARC, good practice under MRC
[object doSomething];
[object doAnotherThing];
[object release]; // inserted by ARC, good practice under MRC
}
The genetic algorithm I used in the test project is roughly 40% slower with ARC because of this. It is a bad (extreme) example because for an algorithm of that kind you should see far greater performance improvements by rewriting critical code sections in C due to lots of insert/remove operations on NSMutableArray, and NSNumber objects being created.
It is downright negligent to dismiss ARC entirely because it can be slower in some situations. If you find those situations to be performance critical then -fno-objc-arc
that code or rewrite it in C.
ARC should not be considered for or against because of performance considerations. ARC is a tool that make's a programmer's job a lot easier. It's up to you to decide whether you like wasting time trying to find leaked objects and dangling pointer crashes so much that you'd rather stick to MRC.
I think that if you get a similar performance regression the only possible explanation is that your manual managed code was "unsafe", i mean there were potential memory leaks and less retain/release calls that made the program memory management in some way unsafe .
I don't think that ARC code is so slower than manually managed one, if the manually managed is well written and safe ...
Of course I think too that a manually managed code well written could be slightly faster than an ARC one, but at what cost ? A lot more work to do by hand ... In most cases it is more trouble than it is worth !
In addition, I think that ARC should be compared with a Garbage Collector environment and not with a perfect written MRC, a man brain will ever be smarter than a program (or at least I hope so ... :-) ) ...
However if you have a well written MRC code base and you are really sure that is safe and faster, why put it under ARC ? Keep it manually memory managed, using -fno-objc-arc
flag ... Using ARC is not mandatory especially for these kind of reasons .
I am afraid to talk something out of topic, and also based on guess...
I think most of performance gain from ARC is by inlining retain/release calls rather than eliding them.
Also, as far as I experienced, ARC usually introduces extra retain/release call. Because ARC is very strict and conservative, so it mostly doesn't perform retain/release elision. And many of newly inserted retain/release calls are semantically required, but omitted by programmer under MRC. (For example, all the passing-in function parameters and temporary variables)
So,
Count of calls to retain/release actually increased a lot to satisfy semantic completeness strictly.
A few of them will be elided by very conservative optimization.
Actual calls to retain/release will be inlined by optimization - by becoming static C function call from dynamic Objective-C method invoke - so calling cost itself will be reduced a lot.
As a result, we usually get decreased performance. I realized I was omitting a lot of retain/release calls before using ARC. But as you pointed, whatever we get, it's semantically complete, and still can be manually - so deterministic - elided by using __unsafe_unretained
.
The cases where you possibly get a noticeable performance regression is where you send messages or call C/C++ functions having object parameters and where the number of instructions is relatively small per such function. The compiler will insert (and later not optimize away again) a retain/release pair for each parameter.
Considering the context, the compiler may recognize that certain retain/release pairs are unnecessary. However I've noticed, that even if a function is called which is declared static inline and which resides in the same translation unit as the caller, the compiler won't be able to optimize away unnecessary pairs of retain/release calls for the parameters.