Parameter estimation in DBSCAN

安稳与你 提交于 2020-06-10 03:41:59

问题


I need to find naturally occurring classes of nouns based on their distribution with different preposition (like agentive, instrumental, time, place etc.). I tried using k-means clustering but of less help, it didn't work well, there was a lot of overlap over the classes that I was looking for (probably because of non-globular shape of classes and random initialisation in k-means).

I am now working on using DBSCAN, but I have trouble understanding the epsilon value and mini-points value in this clustering algorithm. Can I use random values or do I need to compute them. Can anybody help. Particularly with epsilon, at least how to compute it if I need to.


回答1:


Use your domain knowledge to choose the parameters. Epsilon is a radius. You can think of it as a minimum cluster size.

Obviously random values won't work very well. As a heuristic, you can try to look at a k-distance plot; but it's not automatic either.

The first thing to do either way is to choose a good distance function for your data. And perform appropriate normalization.

As for "minPts" it again depends on your data and needs. One user may want a very different value than another. And of course minPts and Epsilon are coupled. If you double epsilon, you will roughly need to increase your minPts by 2^d (for Euclidean distance, because that is how the volume of a hypersphere increases!)

If you want lots of small and fine detailed clusters, choose a low minpts. If you want larger and fewer clusters (and more noise), use a larger minpts. If you don't want any clusters at all, choose minpts larger than your data set size...



来源:https://stackoverflow.com/questions/15050389/parameter-estimation-in-dbscan

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!