python numpy pairwise edit-distance

烈酒焚心 提交于 2019-12-05 11:44:43

If you really must use pdist, you first need to convert your strings to numeric format. If you know that all strings will be the same length, you can do this rather easily:

numeric_d = d.view(np.uint8).reshape((len(d),-1))

This simply views your array of strings as a long array of uint8 bytes, then reshapes it such that each original string is on a row by itself. In your example, this would look like:

In [18]: d.view(np.uint8).reshape((len(d),-1))
Out[18]:
array([[84, 84, 84, 84, 84],
       [65, 84, 84, 84, 84],
       [67, 84, 84, 84, 84],
       [71, 84, 84, 84, 84],
       [84, 65, 84, 84, 84],
       [65, 65, 84, 84, 84],
       [67, 65, 84, 84, 84],
       [71, 65, 84, 84, 84],
       [84, 67, 84, 84, 84],
       [65, 67, 84, 84, 84]], dtype=uint8)

Then, you can use pdist as you normally would. Just make sure that your editdist function is expecting arrays of integers, rather than strings. You could quickly convert your new inputs by calling .tostring():

def editdist(x, y):
  s1 = x.tostring()
  s2 = y.tostring()
  ... rest of function as before ...

def my_pdist(data,f):
    N=len(data)
    matrix=np.empty([N*(N-1)/2])
    ind=0
    for i in range(N):
        for j in range(i+1,N):
            matrix[ind]=f(data[i],data[j])
            ind+=1
    return matrix
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!