Most efficient way to construct similarity matrix

前端 未结 5 1371
孤城傲影
孤城傲影 2020-12-31 13:53

I\'m using the following links to create a \"Euclidean Similarity Matrix\" (that I convert to a DataFrame). https://stats.stackexchange.com/questions/53068/euclidean-distan

5条回答
  •  一整个雨季
    2020-12-31 14:20

    There are two useful function within scipy.spatial.distance that you can use for this: pdist and squareform. Using pdist will give you the pairwise distance between observations as a one-dimensional array, and squareform will convert this to a distance matrix.

    One catch is that pdist uses distance measures by default, and not similarity, so you'll need to manually specify your similarity function. Judging by the commented output in your code, your DataFrame is also not in the orientation pdist expects, so I've undone the transpose you did in your code.

    import pandas as pd
    from scipy.spatial.distance import euclidean, pdist, squareform
    
    
    def similarity_func(u, v):
        return 1/(1+euclidean(u,v))
    
    DF_var = pd.DataFrame.from_dict({"s1":[1.2,3.4,10.2],"s2":[1.4,3.1,10.7],"s3":[2.1,3.7,11.3],"s4":[1.5,3.2,10.9]})
    DF_var.index = ["g1","g2","g3"]
    
    dists = pdist(DF_var, similarity_func)
    DF_euclid = pd.DataFrame(squareform(dists), columns=DF_var.index, index=DF_var.index)
    

提交回复
热议问题