stratified splitting the data
问题 I have a large data set and like to fit different logistic regression for each City, one of the column in my data. The following 70/30 split works without considering City group. indexes <- sample(1:nrow(data), size = 0.7*nrow(data)) train <- data[indexes,] test <- data[-indexes,] But this does not guarantee the 70/30 split for each city. lets say that I have City A and City B, where City A has 100 rows, and City B has 900 rows, totaling 1000 rows. Splitting the data with above code will give