问题
I have 100 groups and each group has some elements inside. For the cross validation, I want to make five bins which their size is as equal as possible.
Is there any algorithm for this purpose.
An example for 5 groups and 2 bins:
Group_1: 5
Group_2: 6
Group_3: 2
Group_4: 7
Group_5: 1
The two bins will be:
G1 and G2 -> their sum is equal to 11.
G3, G4 and G5 -> their sum is equal to 10.
回答1:
This seems related to the set partitioning problem, which is NP-hard but fortunately admits lots of good approximation algorithms and pseudopolynomial-time dynamic programming algorithms. You may want to look into those as a starting point, since there's already quite a lot of work that's been done in this area.
Hope this helps!
回答2:
This is not a cluster analysis problem (I rewrote the question to use the more appropriate wording for you). Cluster analysis is a structure discovery task.
Instead, have a look at the following two related problems from computer science:
- Multiprocessor scheduling seems to be what you need: given n processors, distribute the tasks such that the least time is unused
- Bin packing problem is a classic NP-hard problem, solving the reverse problem: use as few bins of fixed size to accomodate all tasks.
- k-Partition Problem this is probably what you want to do.
All of these appear to be NP-hard, so you will want to use an approximation only (if you have large data, with just 5 examples you can easily brute-force all combinations)
回答3:
If you're looking for a clustering algorithm (partitioning method) with equal size constraint, I would suggest the Spectral Clustering. It will satisfy your demand for clusters with almost the same sizes because it solves the normalized cut problem, which try to find a balanced cut.
来源:https://stackoverflow.com/questions/27338915/filling-bins-with-an-equal-size