I wonder whether there is any automatic way of determining (at least roughly) the Big-O time complexity of a given function?
If I graphed an O(n) function vs. an O(n
A short answer is that it's impossible because constants matter.
For instance, I might write a function that runs in O((n^3/k) + n^2)
. This simplifies to O(n^3) because as n approaches infinity, the n^3
term will dominate the function, irrespective of the constant k
.
However, if k
is very large in the above example function, the function will appear to run in almost exactly n^2
until some crossover point, at which the n^3
term will begin to dominate. Because the constant k
will be unknown to any profiling tool, it will be impossible to know just how large a dataset to test the target function with. If k
can be arbitrarily large, you cannot craft test data to determine the big-oh running time.
I think it's pretty much impossible to do this automatically. Remember that O(g(n)) is the worst-case upper bound and many functions perform better than that for a lot of data sets. You'd have to find the worst-case data set for each one in order to compare them. That's a difficult task on its own for many algorithms.
If you have lots of homogenious computational resources, I'd time them against several samples and do linear regression, then simply take the highest term.
Well, since you can't prove whether or not a function even halts, I think you're asking a little much.
Otherwise @Godeke has it.
You can run the algorithm over various size data sets, and you could then use curve fitting to come up with an approximation. (Just looking at the curve you create probably will be enough in most cases, but any statistical package has curve fitting).
Note that some algorithms exhibit one shape with small data sets, but another with large... and the definition of large remains a bit nebulous. This means that an algorithm with a good performance curve could have so much real world overhead that (for small data sets) it doesn't work as well as the theoretically better algorithm.
As far as code inspection techniques, none exist. But instrumenting your code to run at various lengths and outputting a simple file (RunSize RunLength would be enough) should be easy. Generating proper test data could be more complex (some algorithms work better/worse with partially ordered data, so you would want to generate data that represented your normal use-case).
Because of the problems with the definition of "what is large" and the fact that performance is data dependent, I find that static analysis often is misleading. When optimizing performance and selecting between two algorithms, the real world "rubber hits the road" test is the only final arbitrator I trust.
Proof that this is undecidable:
Suppose that we had some algorithm HALTS_IN_FN(Program, function) which determined whether a program halted in O(f(n)) for all n, for some function f.
Let P be the following program:
if(HALTS_IN_FN(P,f(n)))
{
while(1);
}
halt;
Since the function and the program are fixed, HALTS_IN_FN on this input is constant time. If HALTS_IN_FN returns true, the program runs forever and of course does not halt in O(f(n)) for any f(n). If HALTS_IN_FN returns false, the program halts in O(1) time.
Thus, we have a paradox, a contradiction, and so the program is undecidable.