DBSCAN error with cosine metric in python

后端 未结 2 1639
梦毁少年i
梦毁少年i 2021-01-18 08:50

I was trying to use DBSCAN algorithm from scikit-learn library with cosine metric but was stuck with the error. The line of code is

db = DBSCAN(eps=1, min_s         


        
2条回答
  •  鱼传尺愫
    2021-01-18 09:04

    If you want a normalized distance like the cosine distance, you can also normalize your vectors first and then use the euclidean metric. Notice that for two normalized vectors u and v the euclidean distance is equal to sqrt(2-2*cos(u, v)) (see this discussion)

    You can hence do something like:

    Xnorm = np.linalg.norm(X,axis = 1)
    Xnormed = np.divide(X,Xnorm.reshape(Xnorm.shape[0],1))
    db = DBSCAN(eps=0.5, min_samples=2, metric='euclidean').fit(Xnormed) 
    

    The distances will lie in [0,2] so make sure you adjust your parameters accordingly.

提交回复
热议问题