I need help selecting or creating a clustering algorithm according to certain criteria.
Imagine you are managing newspaper delivery persons.
I acknowledge that this will not necessarily provide clusters of roughly equal size:
One of the best current techniques in data clustering is Evidence Accumulation. (Fred and Jain, 2005) What you do is:
Given a data set with n patterns.
Use an algorithm like k-means over a range of k. Or use a set of different algorithms, the goal is to produce an ensemble of partitions.
Create a co-association matrix C of size n x n.
For each partition p in the ensemble:
3.1 Update the co-association matrix: for each pattern pair (i, j) that belongs to the same cluster in p, set C(i, j) = C(i, j) + 1/N.
Use a clustering algorihm such as Single Link and apply the matrix C as the proximity measure. Single Link gives a dendrogram as result in which we choose the clustering with the longest lifetime.
I'll provide descriptions of SL and k-means if you're interested.