Minimize the sum of errors of representative integers

前端 未结 8 2333
滥情空心
滥情空心 2021-02-07 20:15

Given n integers between [0,10000] as D1,D2...,Dn, where there may be duplicates, and n can be huge:

I want to find k distinct represent

相关标签:
8条回答
  • 2021-02-07 20:47

    Although with help from the community you have managed to state your problem in a form which is understandable mathematically, you still do not supply enough information to enable me (or anyone else) to give you a definitive answer (I would have posted this as a comment, but for some reason I don't see the "add comment" option is available to me). In order to give a good answer to this problem, we need to know the relative sizes of n and k versus each other and 10000 (or the expected maximum of the Di if it isn't 10000), and whether it is critical that you attain the exact minimum (even if this requires an exorbitant amount of time for the calculation) or if a close approximation would be OK, also (and if so, how close do you need to get). In addition, in order to know what algorithm runs in minimum time, we need to have an understanding about what kind of hardware is going to run the algorithm (i.e., do we have m CPU cores to run on in parallel and what is the size of m relative to k).

    Another important piece of information is whether this problem will be solved only once, or it must be solved many times but there exists some connection between the distributions of the integers Di from one problem to the next (e.g., the integers Di are all random samples from a particular, unchanging probability distribution, or perhaps each successive problem has as its input an ever-increasing set which is the set from the previous problem plus an extra s integers).

    No reasonable algorithm for your problem should run in time which depends in a way greater than linear in n, since building a histogram of the n integers Di requires O(n) time, and the answer to the optimization problem itself is only dependent on the histogram of the integers and not on their ordering. No algorithm can run in time less than O(n), since that is the size of the input of the problem.

    A brute force search over all of the possibilities requires, (assuming that at least one of the Di is 0 and another is 10000), for small k, say k < 10, approximately O(10000k-2) time, so if log10(n) >> 4(k-2), this is the optimal algorithm (since in this case the time for the brute force search is insignificant compared to the time to read the input). It is also interesting to note that if k is very close to 10000, then a brute force search only requires O(1000010002-k) (because we can search instead over the integers which are not used as representative integers).

    If you update the definition of the problem with more information, I will attempt to edit my answer in turn.

    0 讨论(0)
  • 2021-02-07 20:48

    If the distribution is near random and the selection (n) is large enough, you are wasting time, generally, trying to optimize for what will amount to real costs in time calculating to gain decreasing improvements in % from expected averages. The fastest average solution is to set the lower k-1 at the low end of intervals M/(k-1), where M is the lowest upper bound - the greatest lower bound (ie, M = max number possible - 0) and the last k at M+1. It would take order k (the best we can do with the information presented in this problem) to figure those values out. Stating what I just did is not a proof of course.

    My point is this. The above discussion is one simplification that I think is very practical for one large class of sets. At the other end, it's straightforward to compute every error possible for all permutations and then select the smallest one. The running time for this makes that solution intractable in many cases. That the person asking this question expects more than the most direct and exact (intractable) answer leaves much that is open-ended. We can trim at the edges from here to eternity trying to quantify all sorts of properties along the infinite solution space for all possible permutations (or combinations) of n numbers and all k values.

    0 讨论(0)
提交回复
热议问题