Is there a specific use of pdist function of scipy for some particular indexes?

让人想犯罪 __ 提交于 2019-12-12 02:46:46

问题


my question is about use of pdist function of scipy.spatial.distance. Although I have to calculate the hamming distances between a 1x64 vector with each and every one of other millions of 1x64 vectors that are stored in a 2D-array, I cannot do it with pdist. Because it returns hamming distances between any two vector inside the same 2D array. I wonder if there is any way to make it calculate hamming distances between a specific index' vector and all others each.

Here is my current code, I use 1000x64 for now because memory error shows up with big arrays.

import numpy as np
from scipy.spatial.distance import pdist


ph = np.load('little.npy')

print pdist(ph, 'hamming').shape

and the output is

(499500,)

little.npy has a 1000x64 array. For example, if I want only to see the hamming distances with 31. vector and all others. What should I do?


回答1:


You can use cdist. For example,

In [101]: from scipy.spatial.distance import cdist

In [102]: x
Out[102]: 
array([[0, 1, 1, 1, 1, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 1, 0],
       [0, 0, 0, 1, 1, 1, 0, 0],
       [1, 0, 1, 1, 0, 1, 1, 0],
       [1, 0, 1, 1, 0, 1, 1, 1],
       [0, 1, 0, 1, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 1, 0, 0],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 0, 0, 1, 1, 1, 0],
       [1, 0, 0, 1, 1, 0, 0, 1]])

In [103]: index = 3

In [104]: cdist(x[index:index+1], x, 'hamming')
Out[104]: 
array([[ 0.625,  0.375,  0.5  ,  0.   ,  0.125,  0.75 ,  0.375,  0.375,
         0.5  ,  0.625]])

That gives the Hamming distance between the row at index 3 and all the other rows (including the row at index 3). The result is a 2D array, with a single row. You might want to immediately pull out that row so the result is 1D:

In [105]: cdist(x[index:index+1], x, 'hamming')[0]
Out[105]: 
array([ 0.625,  0.375,  0.5  ,  0.   ,  0.125,  0.75 ,  0.375,  0.375,
        0.5  ,  0.625])

I used x[index:index+1] instead of just x[index] so that input is a 2D array (with just a single row):

In [106]: x[index:index+1]
Out[106]: array([[1, 0, 1, 1, 0, 1, 1, 0]])

You'll get an error if you use x[index].



来源:https://stackoverflow.com/questions/38995263/is-there-a-specific-use-of-pdist-function-of-scipy-for-some-particular-indexes

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!