问题
I am learning about the "kohonen" library for the R programming language. I created some artificial data to try some of the functions on. I tried using the "supersom()" function on only continuous (i.e type = as.numeric) data and everything works well. However, when I tried to run the "supersom()" function on both continuous and categorical (type = as.factor), I start to run into some errors ("Argument data should be numeric").
The "supersom()" function has an argument called "dist.fct" (distance function) which allows the user to specify which type of "distance" (e.g. "Euclidean" for continuous, "tanimoto" for categorical) should be used for different columns. I created a data set with 4 continuous variables and 3 categorical variables. Using the following link : https://www.rdocumentation.org/packages/kohonen/versions/2.0.5/topics/supersom , I tried to run the example:
#load libraries
library(kohonen)
library(dplyr)
#create and format data
a =rnorm(1000,10,10)
b = rnorm(1000,10,5)
c = rnorm(1000,5,5)
d = rnorm(1000,5,10)
e <- sample( LETTERS[1:4], 100 , replace=TRUE, prob=c(0.25, 0.25, 0.25, 0.25) )
f <- sample( LETTERS[1:5], 100 , replace=TRUE, prob=c(0.2, 0.2, 0.2, 0.2, 0.2) )
g <- sample( LETTERS[1:2], 100 , replace=TRUE, prob=c(0.5, 0.5) )
data = data.frame(a,b,c,d,e,f,g)
data$e = as.factor(data$e)
data$f = as.factor(data$f)
data$g = as.factor(data$g)
cols <- 1:4
data[cols] <- scale(data[cols])
data = as.matrix(data)
#som function
som <- supersom(data= data, grid =somgird(10,10, "hexagonal"),
dist.fct = c("euclidean","euclidean","euclidean","euclidean","tanimoto", "tanimoto", "tanimoto", "tanimoto), keep.data = TRUE)
#sources:
https://cran.r-project.org/web/packages/kohonen/kohonen.pdf
https://www.rdocumentation.org/packages/kohonen/versions/2.0.5/topics/supersom
However, this produces an error "Error in check.data(data): Argument data should be numeric"
. According to the documentation (see the sources I attached), there are default values for the "dist.fct" argument - therefore, I also tried leaving it blank, hoping that the default values would be automatically selected:
som <- supersom(data= data, grid =somgird(10,10, "hexagonal"), keep.data = TRUE)
But this also produced a similar error.
Does anyone know what I am doing wrong?
Thanks
回答1:
If you keep factor or character data in a matrix it will turn all other values of matrix to character since a matrix can have data of only one type. Keep only numeric data in the matrix or convert each column to a list.
library(kohonen)
cols <- 1:4
data[cols] <- scale(data[cols])
som <- supersom(data= as.list(data), grid = somgrid(10,10, "hexagonal"),
dist.fct = "euclidean", keep.data = TRUE)
来源:https://stackoverflow.com/questions/65757562/r-error-error-in-check-data-argument-should-be-numeric