You could use a genetic algorithm for this. If you define a "cluster" as, say, a rectangular sub-area with a high dot density, you could create an initial set of "solutions", each of which consists of some number of randomly-generated, non-overlapping rectangular clusters. You would then write a "fitness function" which evaluates each solution - in this case, you would want the fitness function to minimize the total number of clusters while maximizing the dot density within each cluster.
Your initial set of "solutions" will all be terrible, most likely, but some will likely be slightly less terrible than the others. You use the fitness function to eliminate the worst solutions, then create the next generation of solutions by cross-breeding the "winners" from the last generation. By repeating this process generation by generation, you should end up with one or more good solutions to this problem.
For a genetic algorithm to work, the different possible solutions to a problem space have to be incrementally different from each other in terms of how well they solve the problem. Dot clusters are perfect for this.