I have a gnarly piece of code whose time-efficiency I would like to measure. Since estimating this complexity from the code itself is hard, I want to place it in a loop and time
Measuring the time complexity can be very difficult (if it is possible at all) and I never saw this in algorithm papers. If you cannot calculate the time-complexity from (pseudo-) code or the algorithm description, then maybe you can use a heuristic to simplify the analysis.
Maybe you can also calculate the complexity of some parts of the algorithm and ignore some other parts if they have obviously a much smaller complexity.
If nothing helps, the normal way would to show how the algorithm scales on an machine, just as you wrote. But there are many things that effect the results. Just to notice some of them:
All in all: I think you can only get an idea, how your algorithm scales, but you cannot exactly get an upper bound of the complexity by measuring the run-time. Maybe this works for really small examples, but for bigger ones you will not get correct results.
The best you can do would be:
This way you can see if changes have improved the algorithm or not and others can verify your results.
About the input:
First things first, I don't know of an accepted, "scientific" way to scale repetitions and problem size to achieve faster, more accurate time-vs-size plots, so I cannot say anything on the matter.
Other than that, for a better measurement of time complexity I would suggest to measure the average execution time for a fixed size and compare it with the average execution time measured in the previous cycle. After that you increase the size of the input data and repeat the measurement.
This is similar to one of the methods used in Numerical Analysis to estimate errors of numerical methods. You just adapt it to estimate the average error in the execution time of the implementation of your algorithm.
So, to cut it short:
Let me know if something is unclear.
Use the "ratio method" if you are trying to get a black-box estimate of the complexity. For instance: if you sit in a tight loop doing a fixed length job, like inserting a random record into a database, you record a timestamp at the end of each iteration. The timestamps will start to be farther apart as more data goes in. So, then graph the time difference between contiguous timestamps.
If you divide that graph by lg[n] and it continues to rise, then it's worse than lg[n]. Try dividing by: lg[n], n, nlg[n], nn, etc. When you divide by a function that is too high of an estimate, then plot will trend to zero. When you divide by a function that is too low, then the plot will continue to climb. When you have a good estimate, then there is a point in your data set at which you can place an upper and lower bound where the graph wanders around in for as far out as you care to check.
I'm not aware of any software for this, or previous work done on it. And, fundamentally, I don't think you can get answers of the form "O(whatever)" that are trustworthy. Your measurements are noisy, you might be trying to distinguish nlog(n) operations from nsqrt(n) operations, and unlike a nice clean mathematical analysis, all of the dropped constants are still floating around messing with you.
That said, the process I would go through if I wanted to come up with a best estimate:
Suppose you run the following in a loop. At iteration i = 0, 1, 2, ...., for some fixed n_0 > 0 and very large n, you sample the function at 2 i + n_0 equi-distanced (up to rounding) points in the range 1, ..., n. You then do either one of the following or a combination of both:
Train a spline using the even points and test it on the odd points (and also vice versa). Decide the iteration is enough if the l2 error is below some threshold.
Train a spline using all the points, and test it on the values at, say 2n. Again, decide the iteration is enough if the l2 error is below some threshold.
Point 1. emphasizes the interpolation error, and point 2. emphasizes the extrapolation error. Realistically speaking, I think you will at best be able to identify functions described by a spline.
Depending on the fitting method you use, you might need to fit some meta-parameters for the spline method. In this case, you might need to use more than ~2 i samples per iteration, as you might need to use some of them for parameter-tuning cross validation.