R delete rows in data frame where nrow of index is smaller than certain value

后端 未结 3 1718
眼角桃花
眼角桃花 2021-01-25 18:15

I want to delete certain rows in a data frame when the number of rows with the same index is smaller than a pre-specified value.

> fof.6.5[1:15, 1:3]
   draw          


        
相关标签:
3条回答
  • 2021-01-25 18:31

    You could do this using dplyr (assuming your data is in a data frame called dt:

    dt %>% group_by(draw) %>% filter(n() >= 5) %>% ungroup()
    

    Or you could use table or xtabs:

    tab <- xtabs(~ draw, dt)
    
    dt[!dt$draw %in% as.numeric(names(which(tab < 5))), ]
    
    0 讨论(0)
  • 2021-01-25 18:45

    Here is another option using data.table. Convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'draw', if the nrows (.N) are greater than or equal to 'n' then get the Subset of Data.table (.SD)

    library(data.table)
    n <- 5
    setDT(df1)[, if(.N >= n) .SD, by = draw]  
    #    draw Fund.ID Firm.ID
    # 1:    1    1667     666
    # 2:    1    1572     622
    # 3:    1    1392     553
    # 4:    1     248      80
    # 5:    1    3223     332
    # 6:    2    2959    1998
    # 7:    2    2659    1561
    # 8:    2   14233    2517
    # 9:    2   10521   12579
    #10:    2    3742    1045
    

    If we want only 'n' number of rows, use the head

    setDT(df1)[, if(.N >= n) head(.SD, n), by = draw]  
    
    0 讨论(0)
  • 2021-01-25 18:57

    Code below makes that:

    library(dplyr)
    fof.6.5 <- data.frame(draw = rep(1:4, c(5, 5, 3, 2)),
                          Fun.ID = rnorm(15),
                          Firm.ID = rnorm(15))
    fof_filter <- fof.6.5 %>% group_by(draw) %>% mutate(freq = n()) %>%
      filter(freq >= 5) %>% select(-freq)
    fof_filter
    
    • group_by join (in someway) profiles with the same value of draw
    • mutate add a new variable "freq" that give the number of profiles per each group of draw and repeats it.
    • filter selects only those profiles with "freq" greater or equals to 5
    • select delete the "freq" column because is not needed anymore
    0 讨论(0)
提交回复
热议问题