Is it possible to calculate median of a list of numbers better than O(n log n)?

前端未结

关注

 7  916

I know that it is possible to calculate the mean of a list of numbers in O(n). But what about the median? Is there any better algorithm than sort (O(n log n)) and lookup middl

相关标签:

7条回答

栀梦

2021-02-14 04:37

If the numbers are discrete (e.g. integers) and there is a manageable number of distinct values, you can use a "bucket sort" which is O(N), then iterate over the buckets to figure out which bucket holds the median. The complete calculation is O(N) in time and O(B) in space.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一生所求

2021-02-14 04:38
Partially irrelevant, but: a quick tip on how to quickly find answers to common basic questions like this on the web.
- We're talking about medians? So Gg to the page about medians in wikipedia
- Search page for algorithm:
Efficient computation of the sample median

Even though sorting n items takes in general O(n log n) operations, by using a "divide and conquer" algorithm the median of n items can be computed with only O(n) operations (in fact, you can always find the k-th element of a list of values with this method; this is called the selection problem).
- Follow the link to the selection problem for the description of algorithm. Read intro:
... There are worst-case linear time selection algorithms. ...
- And if you're interested read about the actual ingenious algorithm.
0 讨论(0)
发布评论:

提交评论
- 加载中...
故里飘歌

2021-02-14 04:42

Try the randomized algorithm, the sampling size (e.g. 2000) is independent from the data size n, still be able to get sufficiently high (99%) accuracy. If you need higher accuracy, just increase sampling size. Using Chernoff bound can proof the probability under a certain sampling size. I've write some JavaScript Code to implement the algorithm, feel free to take it. http://www.sfu.ca/~wpa10

0 讨论(0)
发布评论:

提交评论
- 加载中...
遇见更好的自我

2021-02-14 04:56

This link has popped up recently on calculating median: http://matpalm.com/median/question.html .

In general I think you can't go beyond O(n log n) time, but I don't have any proof on that :). No matter how much you make it parallel, aggregating the results into a single value takes at least log n levels of execution.

0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2021-02-14 04:59

Just for fun (and who knows, it may be faster) there's another randomized median algorithm, explained technically in Mitzenmacher's and Upfall's book. Basically, you choose a polynomially-smaller subset of the list, and (with some fancy bookwork) such that it probably contains the real median, and then use it to find the real median. The book is on google books, and here's a link. Note: I was able to read the pages of the algorthm, so assuming that google books reveals the same pages to everyone, you can read them too.

It is a randomized algorithm s.t. if it finds the answer, it is 100% certain that it is the correct answer (this is called Las Vegas style). The randomness arises from the runtime --- occasionally (with probability 1/(sqrt(n)), I think) it FAILS to find the median, and must be re-run.

Asymptotically, it is exactly linear when you take into the chance of failure --- that is to say, it is a wee bit less than linear, exactly such that when you take into account the number of times you may need to re-run it, it becomes linear.

Note: I'm not saying this is better or worse --- I certainly haven't done a real-life runtime comparison between these algorithms! I'm simply presenting an additional algorithm that has linear runtime, but works in a significantly different way.

0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2021-02-14 05:02

Yes. You can do it (deterministically) in O(n).

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页