Check frequency of data.table value in other data.table

后端 未结 2 1813
名媛妹妹
名媛妹妹 2021-02-10 06:14
 library(data.table)
 DT1 <- data.table(num = 1:6, group = c(\"A\", \"B\", \"B\", \"B\", \"A\", \"C\"))
 DT2 <- data.table(group = c(\"A\", \"B\", \"C\"))
<         


        
相关标签:
2条回答
  • 2021-02-10 06:31

    This is how I would do it: first count the number of times each group appears in DT1, then simply join DT2 and DT1.

    require(data.table)
    DT1 <- data.table(num = 1:6, group = c("A", "B", "B", "B", "A", "C"))
    DT2 <- data.table(group = c("A", "B", "C"))
    
    #solution:
    DT1[,num_counts:=.N,by=group] #the number of entries in this group, just count the other column
    setkey(DT1, group)
    setkey(DT2, group)
    DT2 = DT1[DT2,mult="last"][,list(group, popular = (num_counts >= 2))]
    
    #> DT2
    #   group popular
    #1:     A    TRUE
    #2:     B    TRUE
    #3:     C   FALSE
    
    0 讨论(0)
  • 2021-02-10 06:34

    I'd just do it this way:

    ## 1.9.4+
    setkey(DT1, group)
    DT1[J(DT2$group), list(popular = .N >= 2L), by = .EACHI]
    #    group popular
    # 1:     A    TRUE
    # 2:     B    TRUE
    # 3:     C   FALSE
    # 4:     D   FALSE ## on the updated example
    

    data.table's join syntax is quite powerful, in that, while joining, you can also aggregate / select / update columns in j. Here we perform a join. For each row in DT2$group, on the corresponding matching rows in DT1, we compute the j-expression .N >= 2L; by specifying by = .EACHI (please check 1.9.4 NEWS), we compute the j-expression each time.


    In 1.9.4, .() has been introduced as an alias in all i, j and by. So you could also do:

    DT1[.(DT2$group), .(popular = .N >= 2L), by = .EACHI]
    

    When you're joining by a single character column, you can drop the .() / J() syntax altogether (for convenience). So this can be also written as:

    DT1[DT2$group, .(popular = .N >= 2L), by = .EACHI]
    
    0 讨论(0)
提交回复
热议问题