subsetting based on number of observations in a factor variable
how do you subset based on the number of observations of the levels of a factor variable? I have a dataset with 1,000,000 rows and nearly 3000 levels, and I want to subset out the levels with less say 200 observations. data <- read.csv("~/Dropbox/Shared/data.csv", sep=";") summary(as.factor(data$factor) 10001 10002 10003 10004 10005 10006 10007 10009 10010 10011 10012 10013 10014 10016 10017 10018 10019 10020 414 741 2202 205 159 591 194 678 581 774 778 738 1133 997 381 157 522 6 10021 10022 10023 10024 10025 10026 10027 10028 10029 10030 10031 10032 10033 10034 10035 10036 10037 10038 398 416