pdist

Numpy array of distances to list of (row,col,distance)

被刻印的时光 ゝ 提交于 2020-01-06 15:19:02
问题 I have an nd array that looks as follows: [[ 0. 1.73205081 6.40312424 7.21110255 2.44948974] [ 1.73205081 0. 5.09901951 5.91607978 1. ] [ 6.40312424 5.09901951 0. 1. 4.35889894] [ 7.21110255 5.91607978 1. 0. 5.09901951] [ 2.44948974 1. 4.35889894 5.09901951 0. ]] Each element in this array is a distance and I need to turn this into a list with the row,col,distance as follows: l = [(0,0,0),(0,1, 1.73205081),(0,2, 6.40312424),...,(1,0, 1.73205081),(1,1,0),...,(4,4,0)] Additionally, it would be

Calculate two dimensional pairwise distance on a large numpy three dimensional array

爷,独闯天下 提交于 2019-12-22 08:16:27
问题 I have a numpy array of 3 million points in the form of [pt_id, x, y, z] . The goal is to return all pairs of points that have an Euclidean distance two numbers min_d and max_d . The Euclidean distance is between x and y and not on the z . However, I'd like to preserve the array with pt_id_from , pt_id_to , distance attributes. I'm using scipy's dist to calculate the distances: import scipy.spatial.distance coords_arr = np.array([['pt1', 2452130.000, 7278106.000, 25.000], ['pt2', 2479539.000,

Is there a specific use of pdist function of scipy for some particular indexes?

让人想犯罪 __ 提交于 2019-12-12 02:46:46
问题 my question is about use of pdist function of scipy.spatial.distance. Although I have to calculate the hamming distances between a 1x64 vector with each and every one of other millions of 1x64 vectors that are stored in a 2D-array, I cannot do it with pdist. Because it returns hamming distances between any two vector inside the same 2D array. I wonder if there is any way to make it calculate hamming distances between a specific index' vector and all others each. Here is my current code, I use

python numpy pairwise edit-distance

不问归期 提交于 2019-12-07 05:46:31
问题 So, I have a numpy array of strings, and I want to calculate the pairwise edit-distance between each pair of elements using this function: scipy.spatial.distance.pdist from http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.spatial.distance.pdist.html A sample of my array is as follows: >>> d[0:10] array(['TTTTT', 'ATTTT', 'CTTTT', 'GTTTT', 'TATTT', 'AATTT', 'CATTT', 'GATTT', 'TCTTT', 'ACTTT'], dtype='|S5') However, since it doesn't have the 'editdistance' option, therefore, I

Calculate two dimensional pairwise distance on a large numpy three dimensional array

只谈情不闲聊 提交于 2019-12-05 16:23:34
I have a numpy array of 3 million points in the form of [pt_id, x, y, z] . The goal is to return all pairs of points that have an Euclidean distance two numbers min_d and max_d . The Euclidean distance is between x and y and not on the z . However, I'd like to preserve the array with pt_id_from , pt_id_to , distance attributes. I'm using scipy's dist to calculate the distances: import scipy.spatial.distance coords_arr = np.array([['pt1', 2452130.000, 7278106.000, 25.000], ['pt2', 2479539.000, 7287455.000, 4.900], ['pt3', 2479626.000, 7287458.000, 10.000], ['pt4', 2484097.000, 7292784.000, 8

python numpy pairwise edit-distance

烈酒焚心 提交于 2019-12-05 11:44:43
So, I have a numpy array of strings, and I want to calculate the pairwise edit-distance between each pair of elements using this function: scipy.spatial.distance.pdist from http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.spatial.distance.pdist.html A sample of my array is as follows: >>> d[0:10] array(['TTTTT', 'ATTTT', 'CTTTT', 'GTTTT', 'TATTT', 'AATTT', 'CATTT', 'GATTT', 'TCTTT', 'ACTTT'], dtype='|S5') However, since it doesn't have the 'editdistance' option, therefore, I want to give a customized distance function. I tried this and I faced the following error: >>> import

String Distance Matrix in Python using pdist

孤街醉人 提交于 2019-11-29 15:25:32
问题 How to calculate Jaro Winkler distance matrix of strings in Python? I have a large array of hand-entered strings (names and record numbers) and I'm trying to find duplicates in the list, including duplicates that may have slight variations in spelling. A response to a similar question suggested using Scipy's pdist function with a custom distance function. I've tried to implement this solution with the jaro_winkler function in the Levenshtein package. The problem with this is that the jaro