Looping across multiple variables and parameters using map() and mutate()

前端 未结 2 1254
眼角桃花
眼角桃花 2021-01-25 23:59

I\'m having trouble figuring out how to effective map across multiple parameters and variables within a tbl to generate new variables.

In the \"real\" version, I basica

2条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-26 00:32

    I think one of the issues making this task difficult is the current set up might not be very "tidy". E.g. low.a, low.b, med.a etc appear to be examples of what I understand to be 'untidy' columns.

    Below is one possible approach (which I am fairly sure can probably be improved) which doesn't use a for loop or custom function at all. The key idea is to take the initial pracdf and expand the existing rows so there is one row for each "level" (i.e., low, med, and high). Doing this lets us calculate d in a single step with no for loops for low, med, and high.

    (Edited for readability and to include Jens Leerssen's suggestions)

    library(dplyr)
    library(tidyr)
    set.seed(123)
    pracdf <- tibble(ID = letters,
                     p = runif(26, 100, 1000),
                     a = runif(26),
                     b = runif(26),
                     c = runif(26))
    
    levdf <- tibble(level = c("low", "med", "high"),
                    level_val = c(0.8, 1.0, 1.2))
    
    tidy_df <- pracdf %>% merge(levdf) %>%
      mutate(d = p * (level_val * a) * (level_val * b) * c) %>%
      select(-level_val) %>% arrange(ID) %>% as_tibble()
    
    tidy_df
    
    #> # A tibble: 78 x 7
    #>       ID        p         a         b         c level         d
    #>                             
    #>  1     a 358.8198 0.5440660 0.7989248 0.3517979   low 35.116168
    #>  2     a 358.8198 0.5440660 0.7989248 0.3517979   med 54.869013
    #>  3     a 358.8198 0.5440660 0.7989248 0.3517979  high 79.011379
    #>  4     b 809.4746 0.5941420 0.1218993 0.1111354   low  4.169914
    #>  5     b 809.4746 0.5941420 0.1218993 0.1111354   med  6.515490
    #>  6     b 809.4746 0.5941420 0.1218993 0.1111354  high  9.382306
    #>  7     c 468.0792 0.2891597 0.5609480 0.2436195   low 11.837821
    #>  8     c 468.0792 0.2891597 0.5609480 0.2436195   med 18.496595
    #>  9     c 468.0792 0.2891597 0.5609480 0.2436195  high 26.635096
    #> 10     d 894.7157 0.1471136 0.2065314 0.6680556   low 11.622957
    #> # ... with 68 more rows
    

    However, the result above might not be the format you want the final data in. But we can take care of this by doing some gathering and spreading of tidy_df using tidyr::gather and tidyr::spread.

    tidy_df %>%
      gather(variable, value, a, b, d) %>%
      unite(level_variable, level, variable) %>%
      spread(level_variable, value)
    
    #> # A tibble: 26 x 12
    #>       ID        p         c     high_a     high_b     high_d      low_a
    #>  *                                  
    #>  1     a 358.8198 0.3517979 0.54406602 0.79892485  79.011379 0.54406602
    #>  2     b 809.4746 0.1111354 0.59414202 0.12189926   9.382306 0.59414202
    #>  3     c 468.0792 0.2436195 0.28915974 0.56094798  26.635096 0.28915974
    #>  4     d 894.7157 0.6680556 0.14711365 0.20653139  26.151654 0.14711365
    #>  5     e 946.4206 0.4176468 0.96302423 0.12753165  69.905442 0.96302423
    #>  6     f 141.0008 0.7881958 0.90229905 0.75330786 108.778072 0.90229905
    #>  7     g 575.2949 0.1028646 0.69070528 0.89504536  52.681362 0.69070528
    #>  8     h 903.1771 0.4348927 0.79546742 0.37446278 168.480110 0.79546742
    #>  9     i 596.2915 0.9849570 0.02461368 0.66511519  13.845603 0.02461368
    #> 10     j 510.9533 0.8930511 0.47779597 0.09484066  29.775361 0.47779597
    #> # ... with 16 more rows, and 5 more variables: low_b , low_d ,
    #> #   med_a , med_b , med_d 
    

提交回复
热议问题