How to make a graph of clustered boolean variables in R?

眉间皱痕 提交于 2021-02-11 04:25:59

问题


I have a dataset which consists entirely of boolean variables. Exactly like the transformed animal dataset below, only with many more columns.

# http://stats.stackexchange.com/questions/27323/cluster-analysis-of-boolean-vectors-in-r
library(cluster)
head(mona(animals)[[1]])

    war fly ver end gro hai
ant   0   0   0   0   1   0
bee   0   1   0   0   1   1
cat   1   0   1   0   0   1
cpl   0   0   0   0   0   1
chi   1   0   1   1   1   1
cow   1   0   1   0   1   1

The goal is to rearrange the rows in such a way that groupings of similar membership patterns are easier to identify visually.

I figured some kind of clustering algorithm would probably be the way to go but I'm not sure what functions to use or how to go about it exactly.

The table would ideally be graphed as a kind of checkerboard. With shaded squares for whether each point is true or false.


回答1:


This solution uses hierarchical clustering to reorder the variables. It's worth noting this doesn't scale well with large amounts of observations due to dissimilarity matrices getting to big. An alternative algorithm for many observations was suggested in this answer but I didn't fully understand it or see how to implement it based on the chapter referenced.

library(cluster)
library(reshape2)
library(ggplot2)

# testing that it works using the categorical animals dataset
adData <- mona(animals)$data

# import the data, encoded with 0s and 1s for membership
# adData  <- read.csv('adData.csv')

# clustering based off this answer https://stats.stackexchange.com/a/48364
# create a dissimilarity matrix 
disimilarAdData <- daisy(adData)

# hierarchically cluster by dissimilarity
clusteredAdData <- agnes(disimilarAdData)

# reorder the rows by dissimilarity
orderedAdData <- adData[clusteredAdData[[1]], ]

# make it logical data type for better graphing
plotData <- sapply(as.data.frame(orderedAdData), as.logical)
row.names(plotData) <- row.names(orderedAdData)

# plot graph using shaded rows
# http://stackoverflow.com/questions/21316363/plot-and-fill-chessboard-like-area-and-the-similars-in-r
ggplot(melt(plotData), aes(x=Var2, y=Var1, fill=value)) + geom_tile()



来源:https://stackoverflow.com/questions/35667964/how-to-make-a-graph-of-clustered-boolean-variables-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!