I\'m using the following links to create a \"Euclidean Similarity Matrix\" (that I convert to a DataFrame). https://stats.stackexchange.com/questions/53068/euclidean-distan
The simplest way I can find to get the same result as the OP is to use distance_matrix, also from scipy.spatial. The whole thing can be done in one sort-of-long line.
import numpy as np
import pandas as pd
from scipy.spatial import distance_matrix
# Original code from OP, slightly reformatted
DF_var = pd.DataFrame.from_dict({
"s1":[1.2,3.4,10.2],
"s2":[1.4,3.1,10.7],
"s3":[2.1,3.7,11.3],
"s4":[1.5,3.2,10.9]
}).T
DF_var.columns = ["g1","g2","g3"]
# Whole similarity algorithm in one line
df_euclid = pd.DataFrame(
1 / (1 + distance_matrix(DF_var.T, DF_var.T)),
columns=DF_var.columns, index=DF_var.columns
)
# g1 g2 g3
# g1 1.000000 0.215963 0.051408
# g2 0.215963 1.000000 0.063021
# g3 0.051408 0.063021 1.000000
The code above should copy-paste and run in any python IDE.
There are two useful function within scipy.spatial.distance
that you can use for this: pdist and squareform. Using pdist
will give you the pairwise distance between observations as a one-dimensional array, and squareform
will convert this to a distance matrix.
One catch is that pdist
uses distance measures by default, and not similarity, so you'll need to manually specify your similarity function. Judging by the commented output in your code, your DataFrame is also not in the orientation pdist
expects, so I've undone the transpose you did in your code.
import pandas as pd
from scipy.spatial.distance import euclidean, pdist, squareform
def similarity_func(u, v):
return 1/(1+euclidean(u,v))
DF_var = pd.DataFrame.from_dict({"s1":[1.2,3.4,10.2],"s2":[1.4,3.1,10.7],"s3":[2.1,3.7,11.3],"s4":[1.5,3.2,10.9]})
DF_var.index = ["g1","g2","g3"]
dists = pdist(DF_var, similarity_func)
DF_euclid = pd.DataFrame(squareform(dists), columns=DF_var.index, index=DF_var.index)
You want scipy.spatial.distance.pdist
or sklearn.metrics.pairwise.pairwise_distances
This is what I did:
from scipy.spatial.distance import euclidean
DF_var = pd.DataFrame.from_dict({"s1":[1.2,3.4,10.2],"s2":[1.4,3.1,10.7],"s3":[2.1,3.7,11.3],"s4":[1.5,3.2,10.9]}).T
DF_var.columns = ["g1","g2","g3"]
def m_euclid(v1, v2):
return (1/(1 + euclidean(v1,v2)))
dist_list = []
for j1 in DF_var.columns:
dist_list.append([m_euclid(DF_var[j1], DF_var[j2]) for j2 in DF_var.columns])
dist_matrix = pd.DataFrame(dist_list)
I think you can just use pdist
and squareform to broadcast directly on your DataFrame:
from scipy.spatial.distance import pdist,squareform
In [6]: squareform(pdist(DF_var, metric='euclidean'))
Out[6]:
array([[ 0. , 0.6164414 , 1.4525839 , 0.78740079],
[ 0.6164414 , 0. , 1.1 , 0.24494897],
[ 1.4525839 , 1.1 , 0. , 0.87749644],
[ 0.78740079, 0.24494897, 0.87749644, 0. ]])