Extract all rows containing first value for each unique value of another column

问题

I am looking for something similar to this Select only the first rows for each unique value of a column in R but I need to keep ALL rows containing the first values of year per ID. In ither words, I need to subset the dataset on the first year listed, by individual ID. IDs can have their first year in 1 2 or 3, and all of the rows in the first year should be retained. For example:

  ID <- c("54V", "54V", "54V", "54V", "56V", "56V", "56V", "59V", "59V", "59V")
  yr <- c(1, 1, 1, 2, 2, 2, 3, 1, 2, 3)
  test <- data.frame(ID,yr)
  test

    ID yr
1  54V  1
2  54V  1
3  54V  1
4  54V  2
5  56V  2
6  56V  2
7  56V  3
8  59V  1
9  59V  2
10 59V  3

The expected result:

My dataset has many columns and I need to retain them all. Any directions with R or sqldf in R are helpful!

回答1:

We can do this with dplyr

library(dplyr)
test %>% 
    group_by(ID) %>%
    filter(yr==first(yr))
#   ID    yr
#  <fctr> <dbl>
#1    54V     1
#2    54V     1
#3    54V     1
#4    56V     2
#5    56V     2
#6    59V     1

Or using data.table

library(data.table)
setDT(test)[, .SD[yr==yr[1L]], ID]

Or using base R

test[with(test, as.logical(ave(yr, ID, FUN = function(x) x==x[1L]))),]

来源：https://stackoverflow.com/questions/42551449/extract-all-rows-containing-first-value-for-each-unique-value-of-another-column

标签

subset

sqldf

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!