Empirically estimating big-oh time efficiency

前端 未结 10 1857
清酒与你
清酒与你 2020-12-23 16:52

Background

I\'d like to estimate the big-oh performance of some methods in a library through benchmarks. I don\'t need precision -- it suffices to show that someth

相关标签:
10条回答
  • 2020-12-23 17:01

    We have lately implemented a tool that does semi-automated average runtime analysis for JVM code. You do not even have to have access to the sources. It is not published yet (still ironing out some usability flaws) but will be soon, I hope.

    It is based on maximum-likelihood models of program execution [1]. In short, byte code is augmented with cost counters. The target algorithm is then run (distributed, if you want) on a bunch of inputs whose distribution you control. The aggregated counters are extrapolated to functions using involved heuristics (method of least squares on crack, sort of). From those, more science leads to an estimate for the average runtime asymptotics (3.576n - 1.23log(n) + 1.7, for instance). For example, the method is able to reproduce rigorous classic analyses done by Knuth and Sedgewick with high precision.

    The big advantage of this method compared to what others post is that you are independent of time estimates, that is in particular independent of machine, virtual machine and even programming language. You really get information about your algorithm, without all the noise.

    And---probably the killer feature---it comes with a complete GUI that guides you through the whole process.

    See my answer on cs.SE for a little more detail and further references. You can find a preliminary website (including a beta version of the tool and the papers published) here.

    (Note that average runtime can be estimated that way while worst case runtime can never be, except in case you know the worst case. If you do, you can use the average case for worst case analysis; just feed the tool only worst case instances. In general, runtime bounds can not be decided, though.)


    1. Maximum likelihood analysis of algorithms and data structures by U. Laube and M.E. Nebel (2010). [preprint]
    0 讨论(0)
  • 2020-12-23 17:02

    I don't think your approach will work in general.

    The problem is that "big O" complexity is based on a limit as some scaling variable tends to infinity. For smaller values of that variable, the performance behavior can appear to fit a different curve entirely.

    The problem is that with an empirical approach you can never know if the scaling variable is large enough for the limit to be apparent in the results.

    Another problem is that if you implement this in Java / Scala, you have to go to considerable lengths to eliminate distortions and "noise" in your timings due to things like JVM warmup (e.g. class loading, JIT compilation, heap resizing) and garbage collection.

    Finally, nobody is going to place much trust in empirical estimates of complexity. Or at least, they wouldn't if they understood the mathematics of complexity analysis.


    FOLLOWUP

    In response to this comment:

    Your estimate's significance will improve drastically the more and larger samples you use.

    This is true, though my point is that you (Daniel) haven't factored this in.

    Also, runtime functions typically have special characteristics which can be exploited; for example, algorithms tend to not change their behaviour at some huge n.

    For simple cases, yes.

    For complicated cases and real world cases, that is a dubious assumption. For example:

    • Suppose some algorithm uses a hash table with a large but fixed-sized primary hash array, and uses external lists to deal with collisions. For N (== number of entries) less than the size of the primary hash array, the behaviour of most operations will appear to be O(1). The true O(N) behaviour can only be detected by curve fitting when N gets much larger than that.

    • Suppose that the algorithm uses a lot of memory or network bandwidth. Typically, it will work well until you hit the resource limit, and then performance will tail off badly. How do you account for this? If it is part of the "empirical complexity", how do you make sure that you get to the transition point? If you want to exclude it, how do you do that?

    0 讨论(0)
  • 2020-12-23 17:04

    You should consider changing a critical aspects of your task.

    Change the terminology that you are using to: "estimate the runtime of the algorithm" or "setup performance regression testing"

    Can you estimate the runtime of the algorithm? Well you propose to try different input sizes and measure either some critical operation or the time it takes. Then for the series of input sizes you plan to programmaticly estimate if the algorithm's runtime has no growth, constant growth, exponential growth etc.

    So you have two problems, running the tests, and programmatically estimating the growth rate as you input set grows. This sounds like a reasonable task.

    0 讨论(0)
  • 2020-12-23 17:13

    I'm not sure I get 100% what you want. But I understand that you test your own code, so you can modify it, e.g. inject observing statements. Otherwise you could use some form of aspect weaving?

    How about adding resetable counters to your data structures and then increase them each time a particular sub-function is invoked? You could make those counting @elidable so they will be gone in the deployed library.

    Then for a given method, say delete(x), you would test that with all sorts of automatically generated data sets, trying to give them some skew, etc., and gather the counts. While as Igor points out you cannot verify that the data structure won't ever violate a big-O bound, you will at least be able to assert that in the actual experiment a given limit count is never exceeded (e.g. going down a node in a tree is never done more than 4 * log(n) times) -- so you can detect some mistakes.

    Of course, you would need certain assumptions, e.g. that calling a method is O(1) in your computer model.

    0 讨论(0)
  • 2020-12-23 17:14

    What you are looking to achieve is impossible in general. Even the fact that an algorithm will ever stop cannot be proven in general case (see Halting Problem). And even if it does stop on your data you still cannot deduce the complexity by running it. For instance, bubble sort has complexity O(n^2), while on already sorted data it performs as if it was O(n). There is no way to select "appropriate" data for an unknow algorithm to estimate its worst case.

    0 讨论(0)
  • 2020-12-23 17:16

    In order to get started, you have to make a couple of assumptions.

    1. n is large compared to any constant terms.
    2. You can effectively randomize your input data
    3. You can sample with sufficient density to get a good handle on the distribution of runtimes

    In particular, (3) is difficult to achieve in concert with (1). So you may get something with an exponential worst case, but never run into that worst case, and thus think your algorithm is much better than it is on average.

    With that said, all you need is any standard curve fitting library. Apache Commons Math has a fully adequate one. You then either create a function with all the common terms that you want to test (e.g. constant, log n, n, n log n, nn, nn*n, e^n), or you take the log of your data and fit the exponent, and then if you get an exponent not close to an integer, see if throwing in a log n gives a better fit.

    (In more detail, if you fit C*x^a for C and a, or more easily log C + a log x, you can get the exponent a; in the all-common-terms-at-once scheme, you'll get weights for each term, so if you have n*n + C*n*log(n) where C is large, you'll pick up that term also.)

    You'll want to vary the size by enough so that you can tell the different cases apart (might be hard with log terms, if you care about those), and safely more different sizes than you have parameters (probably 3x excess would start being okay, as long as you do at least a dozen or so runs total).


    Edit: Here is Scala code that does all this for you. Rather than explain each little piece, I'll leave it to you to investigate; it implements the scheme above using the C*x^a fit, and returns ((a,C),(lower bound for a, upper bound for a)). The bounds are quite conservative, as you can see from running the thing a few times. The units of C are seconds (a is unitless), but don't trust that too much as there is some looping overhead (and also some noise).

    class TimeLord[A: ClassManifest,B: ClassManifest](setup: Int => A, static: Boolean = true)(run: A => B) {
      @annotation.tailrec final def exceed(time: Double, size: Int, step: Int => Int = _*2, first: Int = 1): (Int,Double) = {
        var i = 0
        val elapsed = 1e-9 * {
          if (static) {
            val a = setup(size)
            var b: B = null.asInstanceOf[B]
            val t0 = System.nanoTime
            var i = 0
            while (i < first) {
              b = run(a)
              i += 1
            }
            System.nanoTime - t0
          }
          else {
            val starts = if (static) { val a = setup(size); Array.fill(first)(a) } else Array.fill(first)(setup(size))
            val answers = new Array[B](first)
            val t0 = System.nanoTime
            var i = 0
            while (i < first) {
              answers(i) = run(starts(i))
              i += 1
            }
            System.nanoTime - t0
          }
        }
        if (time > elapsed) {
          val second = step(first)
          if (second <= first) throw new IllegalArgumentException("Iteration size increase failed: %d to %d".format(first,second))
          else exceed(time, size, step, second)
        }
        else (first, elapsed)
      }
    
      def multibench(smallest: Int, largest: Int, time: Double, n: Int, m: Int = 1) = {
        if (m < 1 || n < 1 || largest < smallest || (n>1 && largest==smallest)) throw new IllegalArgumentException("Poor choice of sizes")
        val frac = (largest.toDouble)/smallest
        (0 until n).map(x => (smallest*math.pow(frac,x/((n-1).toDouble))).toInt).map{ i => 
          val (k,dt) = exceed(time,i)
          if (m==1) i -> Array(dt/k) else {
            i -> ( (dt/k) +: (1 until m).map(_ => exceed(time,i,first=k)).map{ case (j,dt2) => dt2/j }.toArray )
          }
        }.foldLeft(Vector[(Int,Array[Double])]()){ (acc,x) =>
          if (acc.length==0 || acc.last._1 != x._1) acc :+ x
          else acc.dropRight(1) :+ (x._1, acc.last._2 ++ x._2)
        }
      }
    
      def alpha(data: Seq[(Int,Array[Double])]) = {
        // Use Theil-Sen estimator for calculation of straight-line fit for exponent
        // Assume timing relationship is t(n) = A*n^alpha
        val dat = data.map{ case (i,ad) => math.log(i) -> ad.map(x => math.log(i) -> math.log(x)) }
        val slopes = (for {
          i <- dat.indices
          j <- ((i+1) until dat.length)
          (pi,px) <- dat(i)._2
          (qi,qx) <- dat(j)._2
        } yield (qx - px)/(qi - pi)).sorted
        val mbest = slopes(slopes.length/2)
        val mp05 = slopes(slopes.length/20)
        val mp95 = slopes(slopes.length-(1+slopes.length/20))
        val intercepts = dat.flatMap{ case (i,a) => a.map{ case (li,lx) => lx - li*mbest } }.sorted
        val bbest = intercepts(intercepts.length/2)
        ((mbest,math.exp(bbest)),(mp05,mp95))
      }
    }
    

    Note that the multibench method is expected to take about sqrt(2)nm*time to run, assuming that static initialization data is used and is relatively cheap compared to whatever you're running. Here are some examples with parameters chosen to take ~15s to run:

    val tl1 = new TimeLord(x => List.range(0,x))(_.sum)  // Should be linear
    // Try list sizes 100 to 10000, with each run taking at least 0.1s;
    // use 10 different sizes and 10 repeats of each size
    scala> tl1.alpha( tl1.multibench(100,10000,0.1,10,10) )
    res0: ((Double, Double), (Double, Double)) = ((1.0075537890632216,7.061397125245351E-9),(0.8763463348353099,1.102663784225697))
    
    val longList = List.range(0,100000)
    val tl2 = new TimeLord(x=>x)(longList.apply)    // Again, should be linear
    scala> tl2.alpha( tl2.multibench(100,10000,0.1,10,10) )
    res1: ((Double, Double), (Double, Double)) = ((1.4534378213477026,1.1325696181862922E-10),(0.969955396265306,1.8294175293676322))
    
    // 1.45?!  That's not linear.  Maybe the short ones are cached?
    scala> tl2.alpha( tl2.multibench(9000,90000,0.1,100,1) )
    res2: ((Double, Double), (Double, Double)) = ((0.9973235607566956,1.9214696731124573E-9),(0.9486294398193154,1.0365312207345019))
    
    // Let's try some sorting
    val tl3 = new TimeLord(x=>Vector.fill(x)(util.Random.nextInt))(_.sorted)
    scala> tl3.alpha( tl3.multibench(100,10000,0.1,10,10) )
    res3: ((Double, Double), (Double, Double)) = ((1.1713142886974603,3.882658025586512E-8),(1.0521099621639414,1.3392622111121666))
    // Note the log(n) term comes out as a fractional power
    // (which will decrease as the sizes increase)
    
    // Maybe sort some arrays?
    // This may take longer to run because we have to recreate the (mutable) array each time
    val tl4 = new TimeLord(x=>Array.fill(x)(util.Random.nextInt), false)(java.util.Arrays.sort)
    scala> tl4.alpha( tl4.multibench(100,10000,0.1,10,10) )
    res4: ((Double, Double), (Double, Double)) = ((1.1216172965292541,2.2206198821180513E-8),(1.0929414090177318,1.1543697719880128))
    
    // Let's time something slow
    def kube(n: Int) = (for (i <- 1 to n; j <- 1 to n; k <- 1 to n) yield 1).sum
    val tl5 = new TimeLord(x=>x)(kube)
    scala> tl5.alpha( tl5.multibench(10,100,0.1,10,10) )
    res5: ((Double, Double), (Double, Double)) = ((2.8456382116915484,1.0433534274508799E-7),(2.6416659356198617,2.999094292838751))
    // Okay, we're a little short of 3; there's constant overhead on the small sizes
    

    Anyway, for the stated use case--where you are checking to make sure the order doesn't change--this is probably adequate, since you can play with the values a bit when setting up the test to make sure they give something sensible. One could also create heuristics that search for stability, but that's probably overkill.

    (Incidentally, there is no explicit warmup step here; the robust fitting of the Theil-Sen estimator should make it unnecessary for sensibly large benchmarks. This also is why I don't use any other benching framework; any statistics that it does just loses power from this test.)


    Edit again: if you replace the alpha method with the following:

      // We'll need this math
      @inline private[this] def sq(x: Double) = x*x
      final private[this] val inv_log_of_2 = 1/math.log(2)
      @inline private[this] def log2(x: Double) = math.log(x)*inv_log_of_2
      import math.{log,exp,pow}
    
      // All the info you need to calculate a y value, e.g. y = x*m+b
      case class Yp(x: Double, m: Double, b: Double) {}
    
      // Estimators for data order
      //   fx = transformation to apply to x-data before linear fitting
      //   fy = transformation to apply to y-data before linear fitting
      //   model = given x, slope, and intercept, calculate predicted y
      case class Estimator(fx: Double => Double, invfx: Double=> Double, fy: (Double,Double) => Double, model: Yp => Double) {}
      // C*n^alpha
      val alpha = Estimator(log, exp, (x,y) => log(y), p => p.b*pow(p.x,p.m))
      // C*log(n)*n^alpha
      val logalpha = Estimator(log, exp, (x,y) =>log(y/log2(x)), p => p.b*log2(p.x)*pow(p.x,p.m))
    
      // Use Theil-Sen estimator for calculation of straight-line fit
      case class Fit(slope: Double, const: Double, bounds: (Double,Double), fracrms: Double) {}
      def theilsen(data: Seq[(Int,Array[Double])], est: Estimator = alpha) = {
        // Use Theil-Sen estimator for calculation of straight-line fit for exponent
        // Assume timing relationship is t(n) = A*n^alpha
        val dat = data.map{ case (i,ad) => ad.map(x => est.fx(i) -> est.fy(i,x)) }
        val slopes = (for {
          i <- dat.indices
          j <- ((i+1) until dat.length)
          (pi,px) <- dat(i)
          (qi,qx) <- dat(j)
        } yield (qx - px)/(qi - pi)).sorted
        val mbest = slopes(slopes.length/2)
        val mp05 = slopes(slopes.length/20)
        val mp95 = slopes(slopes.length-(1+slopes.length/20))
        val intercepts = dat.flatMap{ _.map{ case (li,lx) => lx - li*mbest } }.sorted
        val bbest = est.invfx(intercepts(intercepts.length/2))
        val fracrms = math.sqrt(data.map{ case (x,ys) => ys.map(y => sq(1 - y/est.model(Yp(x,mbest,bbest)))).sum }.sum / data.map(_._2.length).sum)
        Fit(mbest, bbest, (mp05,mp95), fracrms)
      }
    

    then you can get an estimate of the exponent when there's a log term also--error estimates exist to pick whether the log term or not is the correct way to go, but it's up to you to make the call (i.e. I'm assuming you'll be supervising this initially and reading the numbers that come off):

    val tl3 = new TimeLord(x=>Vector.fill(x)(util.Random.nextInt))(_.sorted)
    val timings = tl3.multibench(100,10000,0.1,10,10)
    
    // Regular n^alpha fit
    scala> tl3.theilsen( timings )
    res20: tl3.Fit = Fit(1.1811648421030059,3.353753446942075E-8,(1.1100382697696545,1.3204652930525234),0.05927994882343982)
    
    // log(n)*n^alpha fit--note first value is closer to an integer
    //   and last value (error) is smaller
    scala> tl3.theilsen( timings, tl3.logalpha )
    res21: tl3.Fit = Fit(1.0369167329732445,9.211366397621766E-9,(0.9722967182484441,1.129869067913768),0.04026308919615681)
    

    (Edit: fixed the RMS computation so it's actually the mean, plus demonstrated that you only need to do timings once and can then try both fits.)

    0 讨论(0)
提交回复
热议问题