I have the following Pandas DataFrame:
In [31]:
import pandas as pd
sample = pd.DataFrame({\'Sym1\': [\'a\',\'a\',\'a\',\'d\'],\'Sym2\':[\'a\',\'c\',\'b\',\'
This is an old question, but there is a Scipy function that does this:
from scipy.spatial.distance import pdist, squareform
distances = pdist(sample.values, metric='euclidean')
dist_matrix = squareform(distances)
pdist
operates on Numpy matrices, and DataFrame.values
is the underlying Numpy NDarray representation of the data frame. The metric
argument allows you to select one of several built-in distance metrics, or you can pass in any binary function to use a custom distance. It's very powerful and, in my experience, very fast. The result is a "flat" array that consists only of the upper triangle of the distance matrix (because it's symmetric), not including the diagonal (because it's always 0). squareform
then translates this flattened form into a full matrix.
The docs have more info, including a mathematical rundown of the many built-in distance functions.