Let\'s imagine we have data as
d1 = np.random.uniform(low=0, high=2, size=(3,2))
d2 = np.random.uniform(low=3, high=5, size=(3,2))
X = np.vstack((d1,d2))
X
arra
pdist(..., metric='seuclidean')
computes the standardized Euclidean distance, not the squared Euclidean distance (which is what cal_pdist
returns).
From the docs:
Y = pdist(X, 'seuclidean', V=None)
Computes the standardized Euclidean distance. The standardized Euclidean distance between two n-vectors
u
andv
is__________________ √∑(ui−vi)^2 / V[xi]
V
is the variance vector;V[i]
is the variance computed over all thei
’th components of the points. If not passed, it is automatically computed.
Try passing metric='sqeuclidean'
, and you will see that both functions return the same result to within rounding error.