knn

Find input image (ID,passport) in imagesDB based on similarity

帅比萌擦擦* 提交于 2019-12-25 18:53:49
问题 I would like to decide if an image is present in a list stored in a DB (e.g. pictures of IDs, passport, Stu. card, etc). I thought about using a KNN algorithm, that will plot the K closest images. Options for distance metric: sum of Euclidean distance between each relative pixels (img1[pixel_i], img2[pixel_i]) sum of Euclidean distance betwen each pixel to each other, multiplied by some factor decreasing with distance (pixel to pixel) same as above, but with manhattan... Do you know/think of

Hadoop kNN join algorithm stuck at map 100% reduce 0%

谁说胖子不能爱 提交于 2019-12-25 05:35:09
问题 15/06/11 10:31:51 INFO mapreduce.Job: map 100% reduce 0% I am trying to run open source kNN join MapReduce hbrj algorithm on a Hadoop 2.6.0 for single node cluster - pseudo-distributed operation installed on my laptop (OSX). (The source can be found here: http://www.cs.utah.edu/~lifeifei/knnj/). This algorithm is comprised of two MapReduce phases where the second phase uses the first phase's output files as its input. The first phase maps and reduces successfully - I can also look into the

how to find important variable using knn in R

余生长醉 提交于 2019-12-24 22:16:12
问题 I want to upgrade accuracy using KNN algorithm. I have 23 factors(sex, age, payment, education etc...) Problem is there are too many variables so i want to know which are effective variables. [info] dataset -> 10000 row, 24 column last column is default(1=yes, 0=no) I divided 7000 for training set and 3000 for test set. When i use all variables, i got error classification about 1000. Also, using ROC Curve shows 800 error. But i want to reduce error rate more. What method i can do?? If you

机器学习经典算法具体解释及Python实现--K近邻(KNN)算法

这一生的挚爱 提交于 2019-12-24 20:51:04
(一)KNN依旧是一种监督学习算法 KNN(K Nearest Neighbors,K近邻 )算法是机器学习全部算法中理论最简单。最好理解的。KNN是一种基于实例的学习,通过计算新数据与训练数据特征值之间的距离,然后选取K(K>=1)个距离近期的邻居进行分类推断(投票法)或者回归。假设K=1。那么新数据被简单分配给其近邻的类。KNN算法算是监督学习还是无监督学习呢?首先来看一下监督学习和无监督学习的定义。对于监督学习。数据都有明白的label(分类针对离散分布,回归针对连续分布),依据机器学习产生的模型能够将新数据分到一个明白的类或得到一个预測值。对于非监督学习,数据没有label。机器学习出的模型是从数据中提取出来的pattern(提取决定性特征或者聚类等)。比如聚类是机器依据学习得到的模型来推断新数据“更像”哪些原数据集合。KNN算法用于分类时,每一个训练数据都有明白的label。也能够明白的推断出新数据的label,KNN用于回归时也会依据邻居的值预測出一个明白的值,因此KNN属于监督学习。 KNN算法的过程为: 选择一种距离计算方式, 通过数据全部的特征计算新数据与已知类别数据集中的数据点的距离 依照距离递增次序进行排序。选取与当前距离最小的k个点 对于离散分类,返回k个点出现频率最多的类别作预測分类;对于回归则返回k个点的加权值作为预測值 (二)KNN算法关键

Accuracy score for a KNN model (IRIS data)

为君一笑 提交于 2019-12-23 04:04:15
问题 What might be some key factors for increasing or stabilizing the accuracy score ( NOT TO significantly vary) of this basic KNN model on IRIS data? Attempt from sklearn import neighbors, datasets, preprocessing from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix iris = datasets.load_iris() X, y = iris.data[:, :], iris.target Xtrain, Xtest, y_train, y_test =

Accuracy score for a KNN model (IRIS data)

冷暖自知 提交于 2019-12-23 04:04:06
问题 What might be some key factors for increasing or stabilizing the accuracy score ( NOT TO significantly vary) of this basic KNN model on IRIS data? Attempt from sklearn import neighbors, datasets, preprocessing from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix iris = datasets.load_iris() X, y = iris.data[:, :], iris.target Xtrain, Xtest, y_train, y_test =

User defined termvectors in ElasticSearch

此生再无相见时 提交于 2019-12-22 17:58:36
问题 How (if at all possible) can one insert any term-vector in an ElasticSearch index? ES computes term-vectors, behind the scenes, in order to carry out it's text mining tasks, but it would be useful to be able to enter any list of (term, weight) pairs instead. Why? Well, for instance, though ES enables kNN (k-nearest-neighbors) for k=2, in the context of geographic proximity, it doesn't have any explicit k>2 functionality. If we were able to insert our own term-vectors, we could hack a k>2

Pass in OpenCV image to KNearest's find_nearest

霸气de小男生 提交于 2019-12-22 15:59:09
问题 I've been following the examples here on setting up Python for OCR by training OpenCV using kNN classification. I followed the first example and generated a knn_data.npz that stores the training data and the training labels for later. What I'm trying to do now is to recall that training data and apply it to an OpenCV image that has a single character inside of it: # Load training data trainingData = np.load('knn_data.npz') train = trainingData['train'] trainLabels = trainingData['train_labels

Naive bayesian classifier - multiple decisions

只愿长相守 提交于 2019-12-22 10:26:51
问题 I need to know whether the Naive bayesian classifier can be used to generate multiple decisions. I couldn't find any examples which have any evidence in supporting multiple decisions. I'm new to this area. So, I'm bit confused. Actually I need to develop character recognition software. There I need to identify what the given character is. It seems the Bayesian classifier can be used to identify whether a character given is a particular character or not, but it cannot give any other

Value of k in k nearest neighbor algorithm

﹥>﹥吖頭↗ 提交于 2019-12-22 07:27:12
问题 I have 7 classes that needs to be classified and I have 10 features. Is there a optimal value for k that I need to use in this case or do I have to run the KNN for values of k between 1 and 10 (around 10) and determine the best value with the help of the algorithm itself? 回答1: In addition to the article I posted in the comments there is this one as well that suggests: Choice of k is very critical – A small value of k means that noise will have a higher influence on the result. A large value