Algorithm to find 100 closest stars to the origin

后端未结

关注

 5  721

花落未央 2021-01-31 11:34

First let me phrase the proper question:

Q: There is a file containing more than a million points (x,y) each of which represents a star. There is a planet earth

5条回答

鱼传尺愫 (楼主)

2021-01-31 12:31

You can actually do this in time O(n) and space O(k), where k is the number of closest points that you want, by using a very clever trick.

The selection problem is as follows: Given an array of elements and some index i, rearrange the elements of the array such that the ith element is in the right place, all elements smaller than the ith element are to the left, and all elements greater than the ith element are to the right. For example, given the array

40 10 00 30 20

If I tried to select based on index 2 (zero-indexed), one result might be

10 00 20 40 30

Since the element at index 2 (20) is in the right place, the elements to the left are smaller than 20, and the elements to the right are greater than 20.

It turns out that since this is a less strict requirement than actually sorting the array, it's possible to do this in time O(n), where n is the number of elements of the array. Doing so requires some complex algorithms like the median-of-medians algorithm, but is indeed O(n) time.

So how do you use this here? One option is to load all n elements from the file into an array, then use the selection algorithm to select the top k in O(n) time and O(n) space (here, k = 100).

But you can actually do better than this! For any constant k that you'd like, maintain a buffer of 2k elements. Load 2k elements from the file into the array, then use the selection algorithm to rearrange it so that the smallest k elements are in the left half of the array and the largest are in the right, then discard the largest k elements (they can't be any of the k closest points). Now, load k more elements from the file into the buffer and do this selection again, and repeat this until you've processed every line of the file. Each time you do a selection you discard the largest k elements in the buffer and retain the k closest points you've seen so far. Consequently, at the very end, you can select the top k elements one last time and find the top k.

What's the complexity of the new approach? Well, you're using O(k) memory for the buffer and the selection algorithm. You end up calling select on a buffer of size O(k) a total of O(n / k) times, since you call select after reading k new elements. Since select on a buffer of size O(k) takes time O(k), the total runtime here is O(n + k). If k = O(n) (a reasonable assumption), this takes time O(n), space O(k).

Hope this helps!

0 讨论(0)

查看其它5个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复