Frequency table including zeros for unused values, on a data.table

后端 未结 2 1857
清酒与你
清酒与你 2021-01-02 16:05

I have a data set that is as follows:

library(data.table)

test <- data.table(structure(list(Issue.Date = structure(c(16041, 16056, 16042,15990, 15996, 16         


        
相关标签:
2条回答
  • 2021-01-02 16:21

    To directly get the counts for each group:

    setkey(test, Month, Complaint)
    
    # may need to also add allow.cartesian, depending on actual data
    test[CJ(Month, Complaint, unique = TRUE), .N, by = .EACHI]
    #      Month Complaint N
    # 1: 2013-10         A 0
    # 2: 2013-10         B 0
    # 3: 2013-10         C 5
    # 4: 2013-10         D 4
    # 5: 2013-10         E 0
    # 6: 2013-11         A 1
    # 7: 2013-11         B 0
    # 8: 2013-11         C 4
    # 9: 2013-11         D 0
    #10: 2013-11         E 0
    #11: 2013-12         A 1
    #12: 2013-12         B 0
    #13: 2013-12         C 0
    #14: 2013-12         D 2
    #15: 2013-12         E 0
    #16: 2014-01         A 0
    #17: 2014-01         B 0
    #18: 2014-01         C 1
    #19: 2014-01         D 0
    #20: 2014-01         E 1
    #21: 2014-02         A 2
    #22: 2014-02         B 0
    #23: 2014-02         C 6
    #24: 2014-02         D 2
    #25: 2014-02         E 0
    #26: 2014-03         A 1
    #27: 2014-03         B 2
    #28: 2014-03         C 3
    #29: 2014-03         D 0
    #30: 2014-03         E 0
    #      Month Complaint N
    

    See first revision of the answer if you want to have the counts in the full data.table instead of summarizing.

    0 讨论(0)
  • 2021-01-02 16:33

    It sounds like you perhaps need to use expand.grid to "fill in" your data.table:

    EG <- data.table(expand.grid(Complaint = unique(test$Complaint), 
                                 Month = unique(test$Month)), 
                     key = "Complaint,Month")
    

    Then, you can merge:

    setkey(test, Complaint, Month)
    Full <- merge(test, EG, all.y = TRUE)
    

    And count like this:

    Full[ , list(sum(!is.na(Issue.Date))), by = "Month,Complaint"]
    #       Month Complaint V1
    #  1: 2013-11         A  1
    #  2: 2013-12         A  1
    #  3: 2014-02         A  2
    #  4: 2014-03         A  1
    #  5: 2013-10         A  0
    #  6: 2014-01         A  0
    #  7: 2013-11         B  0
    #  8: 2013-12         B  0
    # ::: S N I P :::
    # 24: 2014-01         D  0
    # 25: 2013-11         E  0
    # 26: 2013-12         E  0
    # 27: 2014-02         E  0
    # 28: 2014-03         E  0
    # 29: 2013-10         E  0
    # 30: 2014-01         E  1
    #       Month Complaint V1
    

    Alternatively, just use table (???)

    data.table(table(test[, c("Month", "Complaint")]))
    #       Month Complaint N
    #  1: 2013-10         A 0
    #  2: 2013-11         A 1
    #  3: 2013-12         A 1
    #  4: 2014-01         A 0
    #  5: 2014-02         A 2
    #  6: 2014-03         A 1
    #  7: 2013-10         B 0
    # ::: S N I P :::
    # 28: 2014-01         E 1
    # 29: 2014-02         E 0
    # 30: 2014-03         E 0
    #       Month Complaint N
    
    0 讨论(0)
提交回复
热议问题