Parameter estimation in DBSCAN

前端 未结 1 1287
太阳男子
太阳男子 2021-02-15 13:49

I need to find naturally occurring classes of nouns based on their distribution with different preposition (like agentive, instrumental, time, place etc.). I tried using k-means

1条回答
  •  说谎
    说谎 (楼主)
    2021-02-15 14:29

    Use your domain knowledge to choose the parameters. Epsilon is a radius. You can think of it as a minimum cluster size.

    Obviously random values won't work very well. As a heuristic, you can try to look at a k-distance plot; but it's not automatic either.

    The first thing to do either way is to choose a good distance function for your data. And perform appropriate normalization.

    As for "minPts" it again depends on your data and needs. One user may want a very different value than another. And of course minPts and Epsilon are coupled. If you double epsilon, you will roughly need to increase your minPts by 2^d (for Euclidean distance, because that is how the volume of a hypersphere increases!)

    If you want lots of small and fine detailed clusters, choose a low minpts. If you want larger and fewer clusters (and more noise), use a larger minpts. If you don't want any clusters at all, choose minpts larger than your data set size...

    0 讨论(0)
提交回复
热议问题