发表新帖

发表新帖

Take the subsets of a data.frame with the same feature and select a single row from each subset

前端未结

关注

 3  945

Suppose I have a matrix in R as follows:

ID Value
1 10
2 5
2 8
3 15
4 7
4 9
...

What I need is a random sample where every element is repre

相关标签:

3条回答

不要未来只要你来

2020-12-11 13:09
The idea is reorder the rows randomly and then remove duplicates in that order.
```
df <- read.table(text="ID Value
1 10
2 5
2 8
3 15
4 7
4 9", header=TRUE)

df2 <- df[sample(nrow(df)), ]
df2[!duplicated(df2$ID), ]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2020-12-11 13:17
tapply across the rownames and grab a sample of 1 in each ID group:
```
dat[tapply(rownames(dat),dat$ID,FUN=sample,1),]

#  ID Value
#1  1    10
#3  2     8
#4  3    15
#6  4     9
```
If your data is truly a matrix and not a data.frame, you can work around this too, with:
```
dat[tapply(as.character(seq(nrow(dat))),dat$ID,FUN=sample,1),]
```
Don't be tempted to remove the as.character, as sample will give unintended results when there is only one value passed to it. E.g.
```
replicate(10, sample(4,1) )
#[1] 1 1 4 2 1 2 2 2 3 4
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
囚心锁ツ

2020-12-11 13:24
You can do that with dplyr like so:
```
library(dplyr)
df %>% group_by(ID) %>% sample_n(1)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题