Efficient implementation of pairwise distances computation between observations for mixed numeric and categorical data
问题 I am working on a data science project in which I have to compute the euclidian distance between every pair of observations in a dataset. Since I am working with very large datasets, I have to use an efficient implementation of pairwise distances computation (both in terms of memory usage and computation time). One solution is to use the pdist function from Scipy, which returns the result in a 1D array, without duplicate instances. However, this function is not able to deal with categorical