Programmatically obtaining Big-O efficiency of code

后端 未结 18 1152

I wonder whether there is any automatic way of determining (at least roughly) the Big-O time complexity of a given function?

If I graphed an O(n) function vs. an O(n

相关标签:
18条回答
  • 2020-11-27 16:28

    This could work for simple algorithms, but what about O(n^2 lg n), or O(n lg^2 n)?

    You could get fooled visually very easily.

    And if its a really bad algorithm, maybe it wouldn't return even on n=10.

    0 讨论(0)
  • 2020-11-27 16:28

    I guess this isn't possible in a fully automatic way since the type and structure of the input differs a lot between functions.

    0 讨论(0)
  • 2020-11-27 16:32

    It's easy to get an indication (e.g. "is the function linear? sub-linear? polynomial? exponential")

    It's hard to find the exact complexity.

    For example, here's a Python solution: you supply the function, and a function that creates parameters of size N for it. You get back a list of (n,time) values to plot, or to perform regression analysis. It times it once for speed, to get a really good indication it would have to time it many times to minimize interference from environmental factors (e.g. with the timeit module).

    import time
    def measure_run_time(func, args):
      start = time.time()
      func(*args)
      return time.time() - start
    
    def plot_times(func, generate_args, plot_sequence):
      return [
        (n, measure_run_time(func, generate_args(n+1)))
        for n in plot_sequence
      ]
    

    And to use it to time bubble sort:

    def bubble_sort(l):
      for i in xrange(len(l)-1):
        for j in xrange(len(l)-1-i):
          if l[i+1] < l[i]:
            l[i],l[i+1] = l[i+1],l[i]
    
    import random
    def gen_args_for_sort(list_length):
      result = range(list_length) # list of 0..N-1
      random.shuffle(result) # randomize order
      # should return a tuple of arguments
      return (result,)
    
    # timing for N = 1000, 2000, ..., 5000
    times = plot_times(bubble_sort, gen_args_for_sort, xrange(1000,6000,1000))
    
    import pprint
    pprint.pprint(times)
    

    This printed on my machine:

    [(1000, 0.078000068664550781),
     (2000, 0.34400010108947754),
     (3000, 0.7649998664855957),
     (4000, 1.3440001010894775),
     (5000, 2.1410000324249268)]
    
    0 讨论(0)
  • 2020-11-27 16:33

    I am curious as to why it is that you want to be able to do this. In my experience when someone says: "I want to ascertain the runtime complexity of this algorithm" they are not asking what they think they are asking. What you are most likely asking is what is the realistic performance of such an algorithm for likely data. Calculating the Big-O of a function is of reasonable utility, but there are so many aspects that can change the "real runtime performance" of an algorithm in real use that nothing beats instrumentation and testing.

    For example, the following algorithms have the same exact Big-O (wacky pseudocode):

    example a:

    huge_two_dimensional_array foo
    for i = 0, i < foo[i].length, i++
      for j = 0; j < foo[j].length, j++
        do_something_with foo[i][j]
    

    example b:

    huge_two_dimensional_array foo
    for j = 0, j < foo[j].length, j++
      for i = 0; i < foo[i].length, i++
        do_something_with foo[i][j]
    

    Again, exactly the same big-O... but one of them uses row ordinality and one of them uses column ordinality. It turns out that due to locality of reference and cache coherency you might have two completely different actual runtimes, especially depending on the actual size of the array foo. This doesn't even begin to touch the actual performance characteristics of how the algorithm behaves if it's part of a piece of software that has some concurrency built in.

    Not to be a negative nelly but big-O is a tool with a narrow scope. It is of great use if you are deep inside algorithmic analysis or if you are trying to prove something about an algorithm, but if you are doing commercial software development the proof is in the pudding, and you are going to want to have actual performance numbers to make intelligent decisions.

    Cheers!

    0 讨论(0)
  • 2020-11-27 16:33

    Jeffrey L Whitledge is correct. A simple reduction from the halting problem proves that this is undecidable...

    ALSO, if I could write this program, I'd use it to solve P vs NP, and have $1million... B-)

    0 讨论(0)
  • 2020-11-27 16:35

    I'm using a big_O library (link here) that fits the change in execution time against independent variable n to infer the order of growth class O().

    The package automatically suggests the best fitting class by measuring the residual from collected data against each class growth behavior.

    Check the code in this answer.

    Example of output,

    Measuring .columns[::-1] complexity against rapid increase in # rows
    --------------------------------------------------------------------------------
    Big O() fits: Cubic: time = -0.017 + 0.00067*n^3
    --------------------------------------------------------------------------------
    Constant: time = 0.032                                        (res: 0.021)
    Linear: time = -0.051 + 0.024*n                               (res: 0.011)
    Quadratic: time = -0.026 + 0.0038*n^2                         (res: 0.0077)
    Cubic: time = -0.017 + 0.00067*n^3                            (res: 0.0052)
    Polynomial: time = -6.3 * x^1.5                               (res: 6)
    Logarithmic: time = -0.026 + 0.053*log(n)                     (res: 0.015)
    Linearithmic: time = -0.024 + 0.012*n*log(n)                  (res: 0.0094)
    Exponential: time = -7 * 0.66^n                               (res: 3.6)
    --------------------------------------------------------------------------------
    
    0 讨论(0)
提交回复
热议问题