add clusters and nodes from SOMbrero package to training data

三世轮回 提交于 2020-01-04 07:04:30

问题


I am playing a bit with the SOMbrero package. I would like to attach the cluster numbers created like so (taken from here):

my.sc <- superClass(iris.som, k=3)

and X and Y coordinates of the SOM nodes to the training dataset.

In some code, where I use the kohonen package, I create clusters like this:

range01 <- function(x){(x-min(x))/(max(x)-min(x))}

ind <- sapply(SubsetData, is.numeric)
SubsetData[ind] <- lapply(SubsetData[ind], range01)

TrainingMatrix <- as.matrix(SubsetData)

GridDefinition <- somgrid(xdim = 4, ydim = 4, topo = "rectangular", toroidal = FALSE)

SomModel <- som(
    data = TrainingMatrix,
    grid = GridDefinition,
    rlen = 10000,
    alpha = c(0.05, 0.01),
    keep.data = TRUE
)

nb <- table(SomModel$unit.classif)
groups = 5
tree.hc = cutree(hclust(d=dist(SomModel$codes[[1]]),method="ward.D2",members=nb),groups)

plot(SomModel, type="codes", bgcol=rainbow(groups)[tree.hc])

add.cluster.boundaries(SomModel, tree.hc)
result <- OrginalData
result$Cluster <- tree.hc[SomModel$unit.classif]
result$X <- SomModel$grid$pts[SomModel$unit.classif,"x"]
result$Y <- SomModel$grid$pts[SomModel$unit.classif,"y"]

write.table(result, file = "FinalData.csv", sep = ",", col.names = NA, quote = FALSE)

PS:

Some example code using the iris dataset can be found here.

PPS:

I played a bit with the code iris code quoted above and think I have managed to extract the clusters, node ids and prototypes (see code below). What is missing are the coordinates X and Y. I think they are in here:

iris.som$parameters$the.grid$coord

Code:

library(SOMbrero)

set.seed(100)
setwd("D:\\RProjects\Clustering")

#iris.som <- trainSOM(x.data=iris[,1:4],dimension=c(10,10), maxit=100000, scaling="unitvar", radius.type="gaussian")
iris.som <- trainSOM(x.data=iris[,1:4],dimension=c(3,3), maxit=100000, scaling="unitvar", radius.type="gaussian")

# perform a hierarchical clustering
## with 3 super clusters
iris.sc <- superClass(iris.som, k=3)
summary(iris.sc)

# compute the projection quality indicators
quality(iris.som)

iris1 <- iris
iris1$Cluster = iris.sc$cluster[iris.sc$som$clustering]
iris1$Node = iris.sc$som$clustering
iris1$Pt1Sepal.Length = iris.sc$som$prototypes[iris.sc$som$clustering,1]
iris1$Pt2Sepal.Width = iris.sc$som$prototypes[iris.sc$som$clustering,2]
iris1$Pt3Petal.Length = iris.sc$som$prototypes[iris.sc$som$clustering,3]
iris1$Pt4Petal.Width = iris.sc$som$prototypes[iris.sc$som$clustering,4]

write.table(iris1, file = "Iris.csv", sep = ",", col.names = NA, quote = FALSE)

回答1:


I think I have figured it out using the iris example (please correct/improve code! - I am not fluent in R):

library(SOMbrero)

set.seed(100)
setwd("D:\\RProjects\\SomBreroClustering")

iris.som <- trainSOM(x.data=iris[,1:4],dimension=c(5,5), maxit=10000, scaling="unitvar", radius.type="letremy")

# perform a hierarchical clustering
# with 3 super clusters
iris.sc <- superClass(iris.som, k=3)
summary(iris.sc)

# compute the projection quality indicators
quality(iris.som)

iris1 <- iris
iris1$Cluster = iris.sc$cluster[iris.sc$som$clustering]
iris1$Node = iris.sc$som$clustering
iris1$Pt1Sepal.Length = iris.sc$som$prototypes[iris.sc$som$clustering,1]
iris1$Pt2Sepal.Width = iris.sc$som$prototypes[iris.sc$som$clustering,2]
iris1$Pt3Petal.Length = iris.sc$som$prototypes[iris.sc$som$clustering,3]
iris1$Pt4Petal.Width = iris.sc$som$prototypes[iris.sc$som$clustering,4]
iris1$X = iris.som$parameters$the.grid$coord[iris.sc$som$clustering,1]
iris1$Y = iris.som$parameters$the.grid$coord[iris.sc$som$clustering,2]

write.table(iris1, file = "Iris.csv", sep = ",", col.names = NA, quote = FALSE)



回答2:


I am not sure that I got it right but:

  1. iris.som$parameters$the.grid contains coordinates of the clusters (it is a two column array with x and y coordinates in the mapping space)
  2. so I think that what you want to do is

    out.grid <- iris.som$parameters$the.grid$coord
    out.grid$sc <- iris.sc$clustering
    

and export out.grid (a three column array). iris.sc$som$prototypes contains the coordinate of the prototypes of the clusters but in the original space (the four dimensional space in which the iris dataset takes its values.




回答3:


I think my answer captures the requirements. Adding the node ids, x + y coordinates, cluster and prototypes to the original data. Would you agree.

yes :)



来源:https://stackoverflow.com/questions/48843491/add-clusters-and-nodes-from-sombrero-package-to-training-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!