nearest-neighbor

Efficient implementation of the Nearest Neighbour Search

扶醉桌前 提交于 2019-12-21 05:01:08
问题 I am trying to implement an efficient algorithm for nearest-neighbour search problem. I have read tutorials about some data structures, which support operations for this kind of problems (for example, R-tree , cover tree , etc.), but all of them are difficult to implement. Also I cannot find sample source code for these data structures. I know C++ and I am trying to solve this problem in this language. Ideally, I need links that describe how to implement these data structures using source

Search in 300 million addresses with pg_trgm

家住魔仙堡 提交于 2019-12-21 04:59:26
问题 I have 300 million addresses in my PostgreSQL 9.3 DB and I want to use pg_trgm to fuzzy search the rows. The final purpose is to implement a search function just like Google Map search. When I used pg_trgm to search these addresses, it cost about 30s to get the results. There are many rows matching the default similarity threshold condition of 0.3 but I just need about 5 or 10 results. I created a trigram GiST index: CREATE INDEX addresses_trgm_index ON addresses USING gist (address gist_trgm

Finding the nearest neighbor to a single point in MATLAB

情到浓时终转凉″ 提交于 2019-12-20 06:31:58
问题 I'm trying to do a nearest neighbor search that yields a single point as the single "nearest neighbor" to another point in matlab. I've got the following data: A longitude grid that is size 336x264 "lon" some random point within the bounds of the longitude grid "dxf" I've tried using MATLAB's "knnsearch" function https://www.mathworks.com/help/stats/knnsearch.html But sadly when I use the command: idx = knnsearch(lon, dxf) I am met with the error: "Y must be a matrix with 264 columns." Is

How do you optimize this code for nn prediction?

纵然是瞬间 提交于 2019-12-20 04:25:16
问题 How do you optimize this code? At the moment it is running to slow for the amount of data that goes through this loop. This code runs 1-nearest neighbor. It will predict the label of the training_element based off the p_data_set # [x] , [[x1],[x2],[x3]], [l1, l2, l3] def prediction(training_element, p_data_set, p_label_set): temp = np.array([], dtype=float) for p in p_data_set: temp = np.append(temp, distance.euclidean(training_element, p)) minIndex = np.argmin(temp) return p_label_set

Is LSH about transforming vectors to binary vectors for hamming distance?

ぐ巨炮叔叔 提交于 2019-12-20 03:16:41
问题 I read some paper about LSH and I know that is used for solving the approximated k-NN problem. We can divide the algorithm in two parts: Given a vector in D dimensions (where D is big) of any value, translate it with a set of N (where N<<D ) hash functions to a binary vector in N dimensions. Using hamming distance, apply some search technique on the set of given binary codes obtained from phase 1 to find the k-NN. The keypoint is that computing the hamming distance for vectors in N dimensions

Broadcast Annoy object in Spark (for nearest neighbors)?

爱⌒轻易说出口 提交于 2019-12-19 10:24:37
问题 As Spark's mllib doesn't have nearest-neighbors functionality, I'm trying to use Annoy for approximate Nearest Neighbors. I try to broadcast the Annoy object and pass it to workers; however, it does not operate as expected. Below is code for reproducibility (to be run in PySpark). The problem is highlighted in the difference seen when using Annoy with vs without Spark. from annoy import AnnoyIndex import random random.seed(42) f = 40 t = AnnoyIndex(f) # Length of item vector that will be

Broadcast Annoy object in Spark (for nearest neighbors)?

我只是一个虾纸丫 提交于 2019-12-19 10:24:24
问题 As Spark's mllib doesn't have nearest-neighbors functionality, I'm trying to use Annoy for approximate Nearest Neighbors. I try to broadcast the Annoy object and pass it to workers; however, it does not operate as expected. Below is code for reproducibility (to be run in PySpark). The problem is highlighted in the difference seen when using Annoy with vs without Spark. from annoy import AnnoyIndex import random random.seed(42) f = 40 t = AnnoyIndex(f) # Length of item vector that will be

Bit string nearest neighbour searching

孤人 提交于 2019-12-19 08:53:35
问题 I have hundreds of thousands of sparse bit strings of length 32 bits. I'd like to do a nearest neighbour search on them and look-up performance is critical. I've been reading up on various algorithms but they seem to target text strings rather than binary strings. I think either locally sensitive hashing or spectral hashing seem good candidates or I could look into compression. Will any of these work well for my bit string problem ? Any direction or guidance would be greatly appreciated. 回答1:

Bit string nearest neighbour searching

…衆ロ難τιáo~ 提交于 2019-12-19 08:53:11
问题 I have hundreds of thousands of sparse bit strings of length 32 bits. I'd like to do a nearest neighbour search on them and look-up performance is critical. I've been reading up on various algorithms but they seem to target text strings rather than binary strings. I think either locally sensitive hashing or spectral hashing seem good candidates or I could look into compression. Will any of these work well for my bit string problem ? Any direction or guidance would be greatly appreciated. 回答1:

Drawing decision boundaries in R

你说的曾经没有我的故事 提交于 2019-12-18 16:54:50
问题 I've got a series of modelled class labels from the knn function. I've got a data frame with basic numeric training data, and another data frame for test data. How would I go about drawing a decision boundary for the returned values from the knn function? I'll have to replicate my findings on a locked-down machine, so please limit the use of 3rd party libraries if possible. I only have two class labels, "orange" and "blue". They're plotted on a simple 2D plot with the training data. Again, I