euclidean-distance

Why cdist from scipy.spatial.distance is so fast?

*爱你&永不变心* 提交于 2020-05-31 06:11:20
问题 I wanted to create a distance proximity matrix for 10060 records/ points, where each record/point has 23 attributes using euclidean distance as metric. I wrote code using nested for loops to calculate distance between each point(leading to (n(n-1))/2) computations). It took a long time(about 8 minutes). When I used cdist it took so much lesser time( just 3 seconds !!! ). When I looked at the source code, the cdist also uses nested for loops and moreover it makes n^2 computations(which is

Efficiently Calculating a Euclidean Dist Matrix in Numpy?

寵の児 提交于 2020-05-30 08:13:01
问题 I have a large array (~20k entries) of two dimension data, and I want to calculate the pairwise Euclidean distance between all entries. I need the output to have standard square form. Multiple solutions for this problem have been proposed, but none of them seem to work efficiently for large arrays. The method using complex transposing fails for large arrays. Scipy pdist seems to be the most efficient method using numpy. However, using squareform on the result to obtain a square matrix makes

Efficiently Calculating a Euclidean Dist Matrix in Numpy?

时光总嘲笑我的痴心妄想 提交于 2020-05-30 08:12:42
问题 I have a large array (~20k entries) of two dimension data, and I want to calculate the pairwise Euclidean distance between all entries. I need the output to have standard square form. Multiple solutions for this problem have been proposed, but none of them seem to work efficiently for large arrays. The method using complex transposing fails for large arrays. Scipy pdist seems to be the most efficient method using numpy. However, using squareform on the result to obtain a square matrix makes

If/else if: pick first matching record within set distance only after first condition is not met in R

强颜欢笑 提交于 2020-03-23 23:17:14
问题 I would like to pick the closest previous owner within a set distance only after the first search condition isn't met. The locations are called reflo (reference location), and they have a corresponding x and y coordinates (called locx and locy , respectively). The conditions: if lifetime_census$reflo==owners$reflo.x[i] then condition is met if lifetime_census$reflo!=owners$reflo.x[i] , then find next closest record (within 30 meters) if there is no record within 30 meters, then assign NA

(Speed Challenge) Any faster method to calculate distance matrix between rows of two matrices, in terms of Euclidean distance?

浪子不回头ぞ 提交于 2020-02-27 09:25:08
问题 First of all, this is NOT the problem of calculating Euclidean distance between two matrices. Assuming I have two matrices x and y , e.g., set.seed(1) x <- matrix(rnorm(15), ncol=5) y <- matrix(rnorm(20), ncol=5) where > x [,1] [,2] [,3] [,4] [,5] [1,] -0.6264538 1.5952808 0.4874291 -0.3053884 -0.6212406 [2,] 0.1836433 0.3295078 0.7383247 1.5117812 -2.2146999 [3,] -0.8356286 -0.8204684 0.5757814 0.3898432 1.1249309 > y [,1] [,2] [,3] [,4] [,5] [1,] -0.04493361 0.59390132 -1.98935170 -1

calculating average distance of nearest neighbours in pandas dataframe

时光总嘲笑我的痴心妄想 提交于 2020-02-25 03:00:06
问题 I have a set of objects and their positions over time. I would like to get the distance between each car and their nearest neighbour, and calculate an average of this for each time point. An example dataframe is as follows: time = [0, 0, 0, 1, 1, 2, 2] x = [216, 218, 217, 280, 290, 130, 132] y = [13, 12, 12, 110, 109, 3, 56] car = [1, 2, 3, 1, 3, 4, 5] df = pd.DataFrame({'time': time, 'x': x, 'y': y, 'car': car}) df x y car time 0 216 13 1 0 218 12 2 0 217 12 3 1 280 110 1 1 290 109 3 2 130 3

R function to calculate nearest neighbor distance given [inconsistent] constraint?

给你一囗甜甜゛ 提交于 2020-02-21 05:57:42
问题 I have data consisting of tree growth measurements (diameter and height) for trees at known X & Y coordinates. I'd like to determine the distance to each tree's nearest neighbor of equal or greater size . I've seen other SE questions asking about nearest neighbor calculations (e.g., see here, here, here, here, etc.), but none specify constraints on the nearest neighbor to be searched. Is there a function (or other work around) that would allow me to determine the distance of a point's nearest

Minimize total distance between two sets of points in Python

我是研究僧i 提交于 2020-01-21 07:34:46
问题 Given two sets of points in n-dimensional space, how can one map points from one set to the other, such that each point is only used once and the total euclidean distance between the pairs of points is minimized? For example, import matplotlib.pyplot as plt import numpy as np # create six points in 2d space; the first three belong to set "A" and the # second three belong to set "B" x = [1, 2, 3, 1.8, 1.9, 3.4] y = [2, 3, 1, 2.6, 3.4, 0.4] colors = ['red'] * 3 + ['blue'] * 3 plt.scatter(x, y,

Calculate Euclidean Distance within points in numpy array

北慕城南 提交于 2020-01-11 06:07:14
问题 I have 3D array as A = [[x1 y1 z1] [x2 y2 z2] [x3 y3 z3]] I have to find euclidean distance between each points so that I'll get output with only 3 distance between (row0,row1) , (row1,row2) and (row0,row2) . I have some code dist = scipy.spatial.distance.cdist(A,A, 'euclidean') but it will give distance in matrix form as dist= [[0 a b] [a 0 c] [b c 0]] I want results as [a b c] . 回答1: You can do something like this: >>> import numpy as np >>> from itertools import combinations >>> A = np

Is there a way to calculate the following specified matrix by avoiding loops? in R or Matlab

走远了吗. 提交于 2020-01-05 10:31:56
问题 I have an N-by-M matrix X , and I need to calculate an N-by-N matrix Y : Y[i, j] = sum((X[i,] - X[j,]) ^ 2) 0 <= i,j <= N For now, I have to use nested loops to do it with O(n 2 ). I would like to know if there's a better way, like using matrix operations. more generally, sum(....) can be a function, fun(x1,x 2) of which x1 , x2 are M-by-1 vectors. 回答1: you can use expand.grid to get a data.frame of possible pairs: X <- matrix(sample(1:5, 50, replace=TRUE), nrow=10) row.ind <- expand.grid(1