Is it possible to calculate median of a list of numbers better than O(n log n)?

前端 未结 7 897
攒了一身酷
攒了一身酷 2021-02-14 04:30

I know that it is possible to calculate the mean of a list of numbers in O(n). But what about the median? Is there any better algorithm than sort (O(n log n)) and lookup middl

相关标签:
7条回答
  • 2021-02-14 04:37

    If the numbers are discrete (e.g. integers) and there is a manageable number of distinct values, you can use a "bucket sort" which is O(N), then iterate over the buckets to figure out which bucket holds the median. The complete calculation is O(N) in time and O(B) in space.

    0 讨论(0)
  • 2021-02-14 04:38

    Partially irrelevant, but: a quick tip on how to quickly find answers to common basic questions like this on the web.

    • We're talking about medians? So Gg to the page about medians in wikipedia
    • Search page for algorithm:

    Efficient computation of the sample median

    Even though sorting n items takes in general O(n log n) operations, by using a "divide and conquer" algorithm the median of n items can be computed with only O(n) operations (in fact, you can always find the k-th element of a list of values with this method; this is called the selection problem).

    • Follow the link to the selection problem for the description of algorithm. Read intro:

    ... There are worst-case linear time selection algorithms. ...

    • And if you're interested read about the actual ingenious algorithm.
    0 讨论(0)
  • 2021-02-14 04:42

    Try the randomized algorithm, the sampling size (e.g. 2000) is independent from the data size n, still be able to get sufficiently high (99%) accuracy. If you need higher accuracy, just increase sampling size. Using Chernoff bound can proof the probability under a certain sampling size. I've write some JavaScript Code to implement the algorithm, feel free to take it. http://www.sfu.ca/~wpa10

    0 讨论(0)
  • 2021-02-14 04:56

    This link has popped up recently on calculating median: http://matpalm.com/median/question.html .

    In general I think you can't go beyond O(n log n) time, but I don't have any proof on that :). No matter how much you make it parallel, aggregating the results into a single value takes at least log n levels of execution.

    0 讨论(0)
  • 2021-02-14 04:59

    Just for fun (and who knows, it may be faster) there's another randomized median algorithm, explained technically in Mitzenmacher's and Upfall's book. Basically, you choose a polynomially-smaller subset of the list, and (with some fancy bookwork) such that it probably contains the real median, and then use it to find the real median. The book is on google books, and here's a link. Note: I was able to read the pages of the algorthm, so assuming that google books reveals the same pages to everyone, you can read them too.

    It is a randomized algorithm s.t. if it finds the answer, it is 100% certain that it is the correct answer (this is called Las Vegas style). The randomness arises from the runtime --- occasionally (with probability 1/(sqrt(n)), I think) it FAILS to find the median, and must be re-run.

    Asymptotically, it is exactly linear when you take into the chance of failure --- that is to say, it is a wee bit less than linear, exactly such that when you take into account the number of times you may need to re-run it, it becomes linear.

    Note: I'm not saying this is better or worse --- I certainly haven't done a real-life runtime comparison between these algorithms! I'm simply presenting an additional algorithm that has linear runtime, but works in a significantly different way.

    0 讨论(0)
  • 2021-02-14 05:02

    Yes. You can do it (deterministically) in O(n).

    0 讨论(0)
提交回复
热议问题