问题
I've just written a knn model in R. However, I don't know how to use the output to predict new data.
# split into train (treino) and test (teste)
treino_index <- sample(seq_len(nrow(iris)), size = round(0.75*nrow(iris)))
treino <- iris[treino_index, ]
teste <- iris[-treino_index, ]
# take a look at the sample
head(treino)
head(teste)
# save specie from later
treino_especie = treino$Species
teste_especie = teste$Species
# exclude species from train and test dataset
treino = treino[-5]
teste = teste[-5]
# runs knn
library(class)
iris_teste_knn <- knn(train = treino, test = teste, cl= treino_especie,k = 3,prob=TRUE)
# model performance using cross table
install.packages('gmodels')
library('gmodels')
CrossTable(x=teste_especie, y=iris_teste_knn, prop.chisq=FALSE)
How do I apply this to new data. Suppose I have a specie with the following parameters: Sepal.Length = 5.0, Sepal.Width = 3.3, Petal.Length = 1.3, Petal.Width = 0.1. How do I know from which specie this come from?
回答1:
Knn is a lazy classifier. It doesn't creates a fit to predict later, as in case of other classifiers like logistic regression, tree based algorithms etc. It fits and evaluates at the same time. When you are done with tuning of performance parameters, feed the optimized parameters to knn along with new test cases. Use:
x = c(5.0, 3.3, 1.3, 0.1) # test case
knn(train = treino , test = x , cl= treino_especie, k = 3,prob=TRUE)
回答2:
in regards to predict new data within knn model in r, you can simply input into test argument in knn function, e.g. the following
irisk_teste_knn <- knn(train = treino, test = new.Data, cl = treino_especie, k = 3, prob = T)
In kNN model you could specify the k by squareroot total of recording observation.To evaluate the model perhaps you could use CrossTable function contain in gmodel package, as follow:
library(gmodels)
CrossTable(x = new.Data$label,
y = iris_teste_knn,
prop.chisq = F)
来源:https://stackoverflow.com/questions/50632410/how-do-i-use-knn-model-for-new-data-in-r