Randomly split data by criterion into training and testing data set using R

前端未结

关注

 2  1616

Gidday,

I\'m looking for a way to randomly split a data frame (e.g. 90/10 split) for testing and training of a model keeping a certain grouping criteria.

相关标签:

2条回答

清歌不尽

2020-12-20 03:11

Assuming no conditions on what groups you want, the following will randomly split your data frame into 90% and 10% partitions (stored in a list):

set.seed(1)
split(test, sample(1:nrow(test) > round(nrow(test) * .1)))

Produces:

$`FALSE`
   companycode year  expenses
10          C3    6  760.4874
12          C4    1 4565.7831

$`TRUE`
   companycode year    expenses
1           C1    1     8.47720
2           C1    2     8.45250
3           C1    3     8.46280
4           C2    1 14828.90603
5           C3    1   665.21565
6           C3    2   290.66596
7           C3    3   865.56265
8           C3    4  6785.03586
9           C3    5   312.02617
11          C3    7  1155.76758
13          C4    2  3340.36540
14          C4    3  2656.73030
15          C4    4  1079.46098
16          C5    1    60.57039
17          C6    1  6282.48118
18          C6    2  7419.32720
19          C7    1   644.90571
20          C8    1 58332.34945

0 讨论(0)

长情又很酷

2020-12-20 03:26
```
comps <- levels(df$companycode)

trn <- sample(comps, length(comps)*0.9)

df.trn <- subset(df, companycode %in% trn)
df.tst <- subset(df, !(companycode %in% trn))
```
This splits your data so that 90% of companies are in the training set and the rest in the test set.

This does not guarantee that 90% of your rows will be training and 10% test. The rigorous way to achieve this is left as an exercise for the reader. The non-rigorous way would be to repeat the sampling until you get proportions that are roughly correct.
0 讨论(0)
发布评论:

提交评论
- 加载中...