Using lapply to create new columns based on old columns

后端 未结 3 1891
攒了一身酷
攒了一身酷 2021-01-28 19:14

My data looks as follows:

DF <- structure(list(No_Adjusted_Gross_Income = c(183454, 241199, 249506
), NoR_from_1_to_5000 = c(1035373, 4272260, 1124098), NoR_f         


        
3条回答
  •  执念已碎
    2021-01-28 19:51

    The OP has asked in a comment for a grouping variable.

    Although the accepted answer apparently does what the OP initially has asked for I would like to suggest a completey different approach where the data is stored and processed in tidy (long) format. IMHO, processing data in long format is much more straightforward and flexible (which includes aggregation & grouping).

    For this, the dataset is reshaped from wide, Excel-style format to long, SQL-style format by

    library(data.table)
    col <- "NoR"
    long <- melt(DF, measure.vars = patterns(col), value.name = col, variable.name = "range")
    long[, range := stringr::str_remove(range, paste0(col, "_"))]
    long
    
       No_Adjusted_Gross_Income              range     NoR
    1:                   183454     from_1_to_5000 1035373
    2:                   241199     from_1_to_5000 4272260
    3:                   249506     from_1_to_5000 1124098
    4:                   183454 from_5000_to_10000  319540
    5:                   241199 from_5000_to_10000 4826042
    6:                   249506 from_5000_to_10000 1959866
    

    In tidy (long) format there is one row for each observation and one column for each variable (see Chapter 12.2 of Hadley Wickham's book R for Data Science.

    The vector of multipliers val also needs to be reshaped from wide to long format:

    valDF <- long[, .(range = unique(range), val)]
    valDF
    
                    range    val
    1:     from_1_to_5000 2500.5
    2: from_5000_to_10000 7500.0
    

    Now, valDF is also in tidy format as there is one row for each range.

    Finally, we can add a new column AGI to DF by an update join:

    long[valDF, on = "range", AGI := val * NoR][]
    
       No_Adjusted_Gross_Income              range     NoR         AGI
    1:                   183454     from_1_to_5000 1035373  2588950187
    2:                   241199     from_1_to_5000 4272260 10682786130
    3:                   249506     from_1_to_5000 1124098  2810807049
    4:                   183454 from_5000_to_10000  319540  2396550000
    5:                   241199 from_5000_to_10000 4826042 36195315000
    6:                   249506 from_5000_to_10000 1959866 14698995000
    

    If required for presentation, the dataset can be reshaped back from long to wide format:

    dcast(long, No_Adjusted_Gross_Income ~ range, value.var = c("NoR", "AGI"))
    
       No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000 AGI_from_1_to_5000 AGI_from_5000_to_10000
    1:                   183454            1035373                 319540         2588950187             2396550000
    2:                   241199            4272260                4826042        10682786130            36195315000
    3:                   249506            1124098                1959866         2810807049            14698995000
    

    which reproduces OP's expected result. Note that the variable names vn are created automagically.


    Aggregation and grouping can be performed while reshaping

    dcast(long, No_Adjusted_Gross_Income ~ range, sum, value.var = c("NoR", "AGI"))
    
       No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000 AGI_from_1_to_5000 AGI_from_5000_to_10000
    1:                   183454            1035373                 319540         2588950187             2396550000
    2:                   241199            4272260                4826042        10682786130            36195315000
    3:                   249506            1124098                1959866         2810807049            14698995000
    

    or

    dcast(long, No_Adjusted_Gross_Income ~ ., sum, value.var = c("NoR", "AGI"))
    
       No_Adjusted_Gross_Income     NoR         AGI
    1:                   183454 1354913  4985500187
    2:                   241199 9098302 46878101130
    3:                   249506 3083964 17509802049
    

    Alternatively, aggregation & grouping can be performed in long format:

    long[, lapply(.SD, sum), .SDcols = c("NoR", "AGI"), by = No_Adjusted_Gross_Income]
    
       No_Adjusted_Gross_Income     NoR         AGI
    1:                   183454 1354913  4985500187
    2:                   241199 9098302 46878101130
    3:                   249506 3083964 17509802049
    

提交回复
热议问题