Let I have such data frame(df1) with factors:
factor1 factor2 factor3
------- ------- -------
d a x
d a x
b a
We could use Filter
Filter(function(x) min(nlevels(x))>2, df1)
(based on the results in one of the upvoted posts)
Or it could be also
Filter(function(x) min(tabulate(x))>2, df1)
is that what you want?
df <- data.frame(col1=rep(letters[1:4], each=3),
col2=rep(letters[1:2], each=6),
col3=rep(letters[1:3], each=4))
ddf[, sapply(df, function(x) min(nlevels(x)) > 2)]
You could try using lapply
and table
:
df1[, lapply(c(1,2,3), FUN = function(x) min(table(df1[,x]))) >= 3]
and, a little more generic:
df1[, lapply(1:ncol(df1), FUN = function(x) min(table(df1[,x]))) >= 3]
I would create a quick helper function that checks how many unique instances of each level exist with a quick call to table()
-- look at table(df$fac1)
to see how this works. Note this isn't very robust, but should get you started:
df <- data.frame(fac1 = factor(c("d", "d", "b", "b", "b", "c", "c", "c", "c")),
fac2 = factor(c("a", "a", "a", "c", "c", "c", "n", "n", "n")),
fac3 = factor(c(rep("x", 4), rep("y", 5))),
other = 1:9)
at_least_three_instances <- function(column) {
if (is.factor(column)) {
if (min(table(column)) > 2) {
return(TRUE)
} else {
return(FALSE)
}
} else {
return(TRUE)
}
}
df[unlist(lapply(df, at_least_three_instances))]