knn

KNN实验

两盒软妹~` 提交于 2019-12-21 20:00:31
这篇博文我们来学习KNN 具体文件与代码可以从我的GitHub地址获取 https://github.com/liuzuoping/MeachineLearning-Case k-means实验 import numpy as np import matplotlib . pyplot as plt from sklearn . cluster import KMeans from sklearn . datasets import make_blobs plt . figure ( figsize = ( 12 , 12 ) ) # 选取样本数量 n_samples = 1500 # 选取随机因子 random_state = 170 # 获取数据集 X , y = make_blobs ( n_samples = n_samples , random_state = random_state ) # 聚类数量不正确时的效果 y_pred = KMeans ( n_clusters = 2 , random_state = random_state ) . fit_predict ( X ) plt . subplot ( 221 ) plt . scatter ( X [ y_pred == 0 ] [ : , 0 ] , X [ y_pred == 0 ] [ : , 1 ]

How do I avoid time leakage in my KNN model?

我的未来我决定 提交于 2019-12-21 18:05:24
问题 I am building a KNN model to predict housing prices. I'll go through my data and my model and then my problem. Data - # A tibble: 81,334 x 4 latitude longitude close_date close_price <dbl> <dbl> <dttm> <dbl> 1 36.4 -98.7 2014-08-05 06:34:00 147504. 2 36.6 -97.9 2014-08-12 23:48:00 137401. 3 36.6 -97.9 2014-08-09 04:00:40 239105. Model - library(caret) training.samples <- data$close_price %>% createDataPartition(p = 0.8, list = FALSE) train.data <- data[training.samples, ] test.data <- data[

using k-NN in R with categorical values

人走茶凉 提交于 2019-12-21 05:12:16
问题 I'm looking to perform classification on data with mostly categorical features. For that purpose, Euclidean distance (or any other numerical assuming distance) doesn't fit. I'm looking for a kNN implementation for [R] where it is possible to select different distance methods, like Hamming distance. Is there a way to use common kNN implementations like the one in {class} with different distance metric functions? I'm using R 2.15 回答1: As long as you can calculate a distance/dissimilarity matrix

Pre-processing before digit recognition with KNN classifier

不想你离开。 提交于 2019-12-20 19:45:13
问题 Right now I'm trying to create digit recognition system using OpenCV. There are many articles and examples in WEB (and even on StackOverflow). I decided to use KNN classifier because this solution is the most popular in WEB. I found a database of handwritten digits with a training set of 60k examples and with error rate less than 5%. I used this tutorial as an example of how to work with this database using OpenCV. I'm using exactly same technique and on test data ( t10k-images.idx3-ubyte ) I

find the k nearest neighbours of a point in 3d space with python numpy

佐手、 提交于 2019-12-20 06:14:06
问题 I have a 3d point cloud of n points in the format np.array((n,3)). e.g This could be something like: P = [[x1,y1,z1],[x2,y2,z2],[x3,y3,z3],[x4,y4,z4],[x5,y5,z5],.....[xn,yn,zn]] I would like to be able to get the K-nearest neighbors of each point. so for example the k nearest neighbors of P1 might be P2,P3,P4,P5,P6 and the KNN of P2 might be P100,P150,P2 etc etc. how does one go about doing that in python? 回答1: This can be solved neatly with scipy.spatial.distance.pdist. First, let's create

Broadcast Annoy object in Spark (for nearest neighbors)?

爱⌒轻易说出口 提交于 2019-12-19 10:24:37
问题 As Spark's mllib doesn't have nearest-neighbors functionality, I'm trying to use Annoy for approximate Nearest Neighbors. I try to broadcast the Annoy object and pass it to workers; however, it does not operate as expected. Below is code for reproducibility (to be run in PySpark). The problem is highlighted in the difference seen when using Annoy with vs without Spark. from annoy import AnnoyIndex import random random.seed(42) f = 40 t = AnnoyIndex(f) # Length of item vector that will be

Broadcast Annoy object in Spark (for nearest neighbors)?

我只是一个虾纸丫 提交于 2019-12-19 10:24:24
问题 As Spark's mllib doesn't have nearest-neighbors functionality, I'm trying to use Annoy for approximate Nearest Neighbors. I try to broadcast the Annoy object and pass it to workers; however, it does not operate as expected. Below is code for reproducibility (to be run in PySpark). The problem is highlighted in the difference seen when using Annoy with vs without Spark. from annoy import AnnoyIndex import random random.seed(42) f = 40 t = AnnoyIndex(f) # Length of item vector that will be

机器学习---K最近邻(k-Nearest Neighbour,KNN)分类算法

早过忘川 提交于 2019-12-19 07:00:08
K最近邻(k-Nearest Neighbour,KNN)分类算法 1.K最近邻(k-Nearest Neighbour,KNN)    K最近邻(k-Nearest Neighbour,KNN)分类算法,是一个理论上比较成熟的方法,也是最简单的机器学习算法之一。该方法的思路是:如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别。用官方的话来说,所谓K近邻算法,即是给定一个训练数据集,对新的输入实例,在训练数据集中找到与该实例最邻近的K个实例(也就是上面所说的K个邻居), 这K个实例的多数属于某个类,就把该输入实例分类到这个类中。 2.算法原理       如上图所示,有两类不同的样本数据,分别用蓝色的小正方形和红色的小三角形表示,而图正中间的那个绿色的圆所标示的数据则是待分类的数据。也就是说,现在, 我们不知道中间那个绿色的数据是从属于哪一类(蓝色小正方形or红色小三角形),下面,我们就要解决这个问题:给这个绿色的圆分类。   我们常说,物以类聚,人以群分,判别一个人是一个什么样品质特征的人,常常可以从他/她身边的朋友入手,所谓观其友,而识其人。我们不是要判别上图中那个绿色的圆是属于哪一类数据么,好说,从它的邻居下手。但一次性看多少个邻居呢?从上图中,你还能看到: 如果K=3

Error with knn function

不打扰是莪最后的温柔 提交于 2019-12-17 19:27:28
问题 I try to run this line : knn(mydades.training[,-7],mydades.test[,-7],mydades.training[,7],k=5) but i always get this error : Error in knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, : NA/NaN/Inf in foreign function call (arg 6) In addition: Warning messages: 1: In knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, : NAs introduced by coercion 2: In knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, : NAs introduced by coercion Any idea please

Finding K-nearest neighbors and its implementation

房东的猫 提交于 2019-12-17 02:13:27
问题 I am working on classifying simple data using KNN with Euclidean distance. I have seen an example on what I would like to do that is done with the MATLAB knnsearch function as shown below: load fisheriris x = meas(:,3:4); gscatter(x(:,1),x(:,2),species) newpoint = [5 1.45]; [n,d] = knnsearch(x,newpoint,'k',10); line(x(n,1),x(n,2),'color',[.5 .5 .5],'marker','o','linestyle','none','markersize',10) The above code takes a new point i.e. [5 1.45] and finds the 10 closest values to the new point.