Distance matrix for custom distance

…衆ロ難τιáo~ 提交于 2020-01-28 11:27:31

问题


From what I understand, the scipy function scipy.spatial.distance_matrix returns the Minkowski distance for any pair of vectors from the provided matrices of vectors. Is there a way to get the same result for a different distance? Something that would look like distance_matrix(X, Y, distance_function) ?

I assume that scipy does some sort of optimization under the hood. Since I am dealing with very large vectors, I would rather not lose the benefit of these optimizations by implementing my own distance_matrix function.


回答1:


It is quite straight forward to implement it yourself

Also the performance will very likely be better than the distance functions already implemented in scipy.

Most of the distance functions are applying one function on all pairs and sum them up eg. (A_ik-B_jk)**n for Minkowski distance and at the end there is some other function applied eg. acc**(1/n).

Template function

You don't have to change anything here to implement various distance functions.

import numpy as np
import numba as nb

def gen_cust_dist_func(kernel_inner,kernel_outer,parallel=True):

    kernel_inner_nb=nb.njit(kernel_inner,fastmath=True)
    kernel_outer_nb=nb.njit(kernel_outer,fastmath=True)

    def cust_dot_T(A,B):
        assert B.shape[1]==A.shape[1]

        out=np.empty((A.shape[0],B.shape[0]),dtype=A.dtype)
        for i in nb.prange(A.shape[0]):
            for j in range(B.shape[0]):
                acc=0
                for k in range(A.shape[1]):
                    acc+=kernel_inner_nb(A[i,k],B[j,k])
                out[i,j]=kernel_outer_nb(acc)
        return out

    if parallel==True:
        return nb.njit(cust_dot_T,fastmath=True,parallel=True)
    else:
        return nb.njit(cust_dot_T,fastmath=True,parallel=False)

Examples and Timings

#Implement for example a Minkowski distance and euclidian distance
#Minkowski distance p=20
inner=lambda A,B:(A-B)**20
outer=lambda acc:acc**(1./20)
my_minkowski_dist=gen_cust_dist_func(inner,outer,parallel=True)

#Euclidian distance
inner=lambda A,B:(A-B)**2
outer=lambda acc:np.sqrt(acc)
my_euclidian_dist=gen_cust_dist_func(inner,outer,parallel=True)

from scipy.spatial.distance import cdist

A=np.random.rand(1000,50)
B=np.random.rand(1000,50)

#Minkowski p=20
%timeit res_1=cdist(A,B,'m',p=20)
#1.44 s ± 8.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit res_2=my_minkowski_dist(A,B)
#10.8 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
res_1=cdist(A,B,'m',p=20)
res_2=my_minkowski_dist(A,B)
print(np.allclose(res_1,res_2))
#True

#Euclidian
%timeit res_1=cdist(A,B,'euclidean')
#39.3 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit res_2=my_euclidian_dist(A,B)
#3.61 ms ± 22.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
res_1=res_1=cdist(A,B,'euclidean')
res_2=my_euclidian_dist(A,B)
print(np.allclose(res_1,res_2))
#True


来源:https://stackoverflow.com/questions/58747022/distance-matrix-for-custom-distance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!