pairwise-distance

Efficient implementation of pairwise distances computation between observations for mixed numeric and categorical data

送分小仙女□ 提交于 2021-02-07 04:07:15
问题 I am working on a data science project in which I have to compute the euclidian distance between every pair of observations in a dataset. Since I am working with very large datasets, I have to use an efficient implementation of pairwise distances computation (both in terms of memory usage and computation time). One solution is to use the pdist function from Scipy, which returns the result in a 1D array, without duplicate instances. However, this function is not able to deal with categorical

Efficient implementation of pairwise distances computation between observations for mixed numeric and categorical data

和自甴很熟 提交于 2021-02-07 04:02:09
问题 I am working on a data science project in which I have to compute the euclidian distance between every pair of observations in a dataset. Since I am working with very large datasets, I have to use an efficient implementation of pairwise distances computation (both in terms of memory usage and computation time). One solution is to use the pdist function from Scipy, which returns the result in a 1D array, without duplicate instances. However, this function is not able to deal with categorical