Proportional Tables by Group

后端 未结 1 420
粉色の甜心
粉色の甜心 2021-01-21 04:02

I have a data.frame structured as follows:

location               gender        15.19     20.30      31.40      41.64      65.
New York                Female             


        
相关标签:
1条回答
  • 2021-01-21 04:34
    library("data.table")
    selected_cols <- colnames(df)[3:7]
    setDT(df)[, prop.table(.SD), by = location, .SDcols = selected_cols]
    
    #         location      X15.19    X20.30     X31.40     X41.64        X65.
    # 1:      New York 0.014925373 0.3059701 0.09701493 0.14179104 0.007462687
    # 2:      New York 0.007462687 0.1716418 0.11194030 0.12686567 0.014925373
    # 3: San Francisco 0.008928571 0.2410714 0.12500000 0.12500000 0.026785714
    # 4: San Francisco 0.035714286 0.2142857 0.12500000 0.08928571 0.008928571
    # 5:   Mexico City 0.006493506 0.2597403 0.16883117 0.07142857 0.019480519
    # 6:   Mexico City 0.025974026 0.1493506 0.22727273 0.05194805 0.019480519
    # 7:         Paris 0.026315789 0.1578947 0.13157895 0.07894737 0.000000000
    # 8:         Paris 0.013157895 0.2631579 0.17105263 0.14473684 0.013157895
    

    Verify results: Whether each city sum to 1

    a1 <- setDT(df)[, prop.table(.SD), by = location, .SDcols = selected_cols]
    sum(subset(a1, location == "New York", select = selected_cols))
    # [1] 1
    

    Then gender column can be combined with a1

    do.call(cbind, list(gender = df$gender, a1))
    

    Data:

    dput(df)
    structure(list(location = c("New York", "New York", "San Francisco", 
    "San Francisco", "Mexico City", "Mexico City", "Paris", "Paris"
    ), gender = c("Female", "Male", "Female", "Male", "Female", "Male", 
    "Female", "Male"), X15.19 = c(2L, 1L, 1L, 4L, 1L, 4L, 2L, 1L), 
        X20.30 = c(41L, 23L, 27L, 24L, 40L, 23L, 12L, 20L), X31.40 = c(13L, 
        15L, 14L, 14L, 26L, 35L, 10L, 13L), X41.64 = c(19L, 17L, 
        14L, 10L, 11L, 8L, 6L, 11L), X65. = c(1L, 2L, 3L, 1L, 3L, 
        3L, 0L, 1L)), .Names = c("location", "gender", "X15.19", 
    "X20.30", "X31.40", "X41.64", "X65."), row.names = c(NA, -8L), class = c("data.table", 
    "data.frame"), .internal.selfref = <pointer: 0x0000000000200788>)
    

    EDIT: As per @Sumedh's suggestion in the comments below, gender column will not drop and all selected_cols has to be double type. This is due to proportions from prop.table being of double type. Else, RHS = LHS data type coercion error will arise when assignment by reference := is performed on the selected_cols.

    setDT(df)[, (selected_cols) := prop.table(.SD), by = location, .SDcols = selected_cols]
    
    0 讨论(0)
提交回复
热议问题