问题
I am learning about the "kohonen" package in R for the purpose of making Self Organizing Maps (SOM, also called Kohonen Networks - a type of Machine Learning algorithm). I am following this R language tutorial over here: https://www.rpubs.com/loveb/som
I tried to create my own data (this time with both "factor" and "numeric" variables) and run the SOM algorithm (this time using the "supersom()" function instead):
#load libraries and adjust colors
library(kohonen) #fitting SOMs
library(ggplot2) #plots
library(RColorBrewer) #colors, using predefined palettes
contrast <- c("#FA4925", "#22693E", "#D4D40F", "#2C4382", "#F0F0F0", "#3D3D3D") #my own, contrasting pairs
cols <- brewer.pal(10, "Paired")
#create and format data
a =rnorm(1000,10,10)
b = rnorm(1000,10,5)
c = rnorm(1000,5,5)
d = rnorm(1000,5,10)
e <- sample( LETTERS[1:4], 100 , replace=TRUE, prob=c(0.25, 0.25, 0.25, 0.25) )
f <- sample( LETTERS[1:5], 100 , replace=TRUE, prob=c(0.2, 0.2, 0.2, 0.2, 0.2) )
g <- sample( LETTERS[1:2], 100 , replace=TRUE, prob=c(0.5, 0.5) )
data = data.frame(a,b,c,d,e,f,g)
data$e = as.factor(data$e)
data$f = as.factor(data$f)
data$g = as.factor(data$g)
cols <- 1:4
data[cols] <- scale(data[cols])
#som model
som <- supersom(data= as.list(data), grid = somgrid(10,10, "hexagonal"),
dist.fct = "euclidean", keep.data = TRUE)
From here, I was able to successfully make some of the basic plots:
#plots
#pretty gradient colors
colour1 <- tricolor(som$grid)
colour4 <- tricolor(som$grid, phi = c(pi/8, 6, -pi/6), offset = 0.1)
plot(som, type="changes")
plot(som, type="count")
plot(som, type="quality", shape = "straight")
plot(som, type="dist.neighbours", palette.name=grey.colors, shape = "straight")
However, the problem arises when I try to make individual plots for each variable:
#error
var <- 1 #define the variable to plot
plot(som, type = "property", property = getCodes(som)[,var], main=colnames(getCodes(som))[var], palette.name=terrain.colors)
var <- 6 #define the variable to plot
plot(som, type = "property", property = getCodes(som)[,var], main=colnames(getCodes(som))[var], palette.name=terrain.colors)
This produces an error: "Error: Incorrect Number of Dimensions"
A similar error (NAs by coercion
) is produced when attempting to cluster the SOM Network:
#cluster (error)
set.seed(33) #for reproducability
fit_kmeans <- kmeans(data, 3) #3 clusters are used, as indicated by the wss development.
cl_assignmentk <- fit_kmeans$cluster[data$unit.classif]
par(mfrow=c(1,1))
plot(som, type="mapping", bg = rgb(colour4), shape = "straight", border = "grey",col=contrast)
add.cluster.boundaries(som, fit_kmeans$cluster, lwd = 3, lty = 2, col=contrast[4])
Can someone please tell me what I am doing wrong? Thanks
Sources: https://www.rdocumentation.org/packages/kohonen/versions/2.0.5/topics/supersom
回答1:
getCodes()
produces a list and as such you have to treat it like one.
Calling getCodes(som)
produces a list containing 7 items named a-g as such you should be selecting items from the list either using $
or [[]]
e.g
plot(som, type = "property", property = getCodes(som)[[1]], main=names(getCodes(som))[1], palette.name=terrain.colors)
or
plot(som, type = "property", property = getCodes(som)$a, main="a", palette.name=terrain.colors)
or
plot(som, type = "property", property = getCodes(som)[["a"]], main="a", palette.name=terrain.colors)
if you must set the variable prior to calling the plot you can do so like:
var <- 1
plot(som, type = "property", property = getCodes(som)[[var]], main=names(getCodes(som))[var], palette.name=terrain.colors)
Regarding kmeans()
kmeans()
needs a matrix or an object that can be coerced into a matrix, you have factors (categorical data) which cannot be coerced into numeric, either drop the factors, or find another method.
drop the factors:
#cluster (error)
set.seed(33)
#for reproducability
fit_kmeans <- kmeans(as.matrix(data[1:4]), 3)
#3 clusters are used, as indicated by the wss development.
cl_assignmentk <- fit_kmeans$cluster[data$unit.classif]
par(mfrow=c(1,1))
plot(som, type="mapping", bg = rgb(colour4), shape = "straight", border = "grey",col=contrast)
add.cluster.boundaries(som, fit_kmeans$cluster, lwd = 3, lty = 2, col=contrast[4])
edit:
Alternatively you can specify the code directly from getCodes()
by using idx like so:
plot(som, type = "property", property = getCodes(som, idx = 1), main="a"), palette.name=terrain.colors)
来源:https://stackoverflow.com/questions/65781259/plot-causes-error-incorrect-number-of-dimensions