Function to calculate Euclidean distance in R

依然范特西╮ 提交于 2021-02-10 14:19:14

问题


I am trying to implement KNN classifier in R from scratch on iris data set and as a part of this i have written a function to calculate the Euclidean distance. Here is my code.

known_data <- iris[1:15,c("Sepal.Length", "Petal.Length", "Class")]
unknown_data <- iris[16,c("Sepal.Length", "Petal.Length")]

# euclidean distance
 euclidean_dist <- function(k,unk) {
 distance <- 0
 for(i in 1:nrow(k))
 distance[i] <- sqrt((k[,1][i] - unk[,1][i])^2 + (k[,2][i] - unk[,2][i])^2)
 return(distance)
} 

euclidean_dist(known_data, unknown_data)

However, when i call the function it's returning the first value correctly and rest as NA. Could anyone show where i could have gone wrong with the code? Thanks in advance.


回答1:


The aim is to calculate the distance between the ith row of known_data, and the single unknown_data point.

How to fix your code

When you calculate distance[i], you're trying to access the ith row of the unknown data point, which doesn't exits, and is hence NA. I believe your code should run fine if you make the following edits:

known_data <- iris[1:15,c("Sepal.Length", "Petal.Length", "Class")] 
unknown_data <- iris[16,c("Sepal.Length", "Petal.Length")]

# euclidean distance
euclidean_dist <- function(k,unk) {
  # Make distance a vector [although not technically required]
  distance <- rep(0, nrow(k))

  for(i in 1:nrow(k))
    # Change unk[,1][i] to unk[1,1] and similarly for unk[,2][i]
    distance[i] <- sqrt((k[,1][i] - unk[1,1])^2 + (k[,2][i] - unk[1,2])^2)

  return(distance)
} 

euclidean_dist(known_data, unknown_data)

One final note - in the version of R I'm using, the known dataset uses a Species as opposed to Class column

An alternative method

As suggested by @Roman Luštrik, the entire aim of getting the Euclidean distances can be achieved with a simple one-liner:

sqrt((known_data[, 1] - unknown_data[, 1])^2 + (known_data[, 2] - unknown_data[, 2])^2)

This is very similar to the function you wrote, but does it in vectorised form, rather than through a loop, which is often a preferable way of doing things in R.




回答2:


The best and fastst way is using h2o package:

#load library
    library(h2o)
#initialize the node
    h2o.init()
#transform the df to h2o type
    known_data<-as.h2o(known_data)
    unknown_data<-as.h2o(unknown_data)
#create a matrix in which the distances are going to be record
    matrix1<-h2o.createFrame(rows=nrow(known_data),cols=unknown_data)
#do a loop to calculate the distance between all the rows of both df
    for(i in 1:nrow(unknown_data)){
    matrix[,i]<-as.data.frame(h2o.distance(known_data, unknown_data[i,],"l2"))
    }


来源:https://stackoverflow.com/questions/45780199/function-to-calculate-euclidean-distance-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!