问题
I am looking for something similar to this Select only the first rows for each unique value of a column in R but I need to keep ALL rows containing the first values of year per ID. In ither words, I need to subset the dataset on the first year listed, by individual ID. IDs can have their first year in 1 2 or 3, and all of the rows in the first year should be retained. For example:
ID <- c("54V", "54V", "54V", "54V", "56V", "56V", "56V", "59V", "59V", "59V")
yr <- c(1, 1, 1, 2, 2, 2, 3, 1, 2, 3)
test <- data.frame(ID,yr)
test
ID yr
1 54V 1
2 54V 1
3 54V 1
4 54V 2
5 56V 2
6 56V 2
7 56V 3
8 59V 1
9 59V 2
10 59V 3
The expected result:
ID yr
1 54V 1
2 54V 1
3 54V 1
4 56V 2
5 56V 2
6 59V 1
My dataset has many columns and I need to retain them all. Any directions with R or sqldf in R are helpful!
回答1:
We can do this with dplyr
library(dplyr)
test %>%
group_by(ID) %>%
filter(yr==first(yr))
# ID yr
# <fctr> <dbl>
#1 54V 1
#2 54V 1
#3 54V 1
#4 56V 2
#5 56V 2
#6 59V 1
Or using data.table
library(data.table)
setDT(test)[, .SD[yr==yr[1L]], ID]
Or using base R
test[with(test, as.logical(ave(yr, ID, FUN = function(x) x==x[1L]))),]
来源:https://stackoverflow.com/questions/42551449/extract-all-rows-containing-first-value-for-each-unique-value-of-another-column