Looping across multiple variables and parameters using map() and mutate()

前端 未结 2 1258
眼角桃花
眼角桃花 2021-01-25 23:59

I\'m having trouble figuring out how to effective map across multiple parameters and variables within a tbl to generate new variables.

In the \"real\" version, I basica

相关标签:
2条回答
  • 2021-01-26 00:32

    I think one of the issues making this task difficult is the current set up might not be very "tidy". E.g. low.a, low.b, med.a etc appear to be examples of what I understand to be 'untidy' columns.

    Below is one possible approach (which I am fairly sure can probably be improved) which doesn't use a for loop or custom function at all. The key idea is to take the initial pracdf and expand the existing rows so there is one row for each "level" (i.e., low, med, and high). Doing this lets us calculate d in a single step with no for loops for low, med, and high.

    (Edited for readability and to include Jens Leerssen's suggestions)

    library(dplyr)
    library(tidyr)
    set.seed(123)
    pracdf <- tibble(ID = letters,
                     p = runif(26, 100, 1000),
                     a = runif(26),
                     b = runif(26),
                     c = runif(26))
    
    levdf <- tibble(level = c("low", "med", "high"),
                    level_val = c(0.8, 1.0, 1.2))
    
    tidy_df <- pracdf %>% merge(levdf) %>%
      mutate(d = p * (level_val * a) * (level_val * b) * c) %>%
      select(-level_val) %>% arrange(ID) %>% as_tibble()
    
    tidy_df
    
    #> # A tibble: 78 x 7
    #>       ID        p         a         b         c level         d
    #>    <chr>    <dbl>     <dbl>     <dbl>     <dbl> <chr>     <dbl>
    #>  1     a 358.8198 0.5440660 0.7989248 0.3517979   low 35.116168
    #>  2     a 358.8198 0.5440660 0.7989248 0.3517979   med 54.869013
    #>  3     a 358.8198 0.5440660 0.7989248 0.3517979  high 79.011379
    #>  4     b 809.4746 0.5941420 0.1218993 0.1111354   low  4.169914
    #>  5     b 809.4746 0.5941420 0.1218993 0.1111354   med  6.515490
    #>  6     b 809.4746 0.5941420 0.1218993 0.1111354  high  9.382306
    #>  7     c 468.0792 0.2891597 0.5609480 0.2436195   low 11.837821
    #>  8     c 468.0792 0.2891597 0.5609480 0.2436195   med 18.496595
    #>  9     c 468.0792 0.2891597 0.5609480 0.2436195  high 26.635096
    #> 10     d 894.7157 0.1471136 0.2065314 0.6680556   low 11.622957
    #> # ... with 68 more rows
    

    However, the result above might not be the format you want the final data in. But we can take care of this by doing some gathering and spreading of tidy_df using tidyr::gather and tidyr::spread.

    tidy_df %>%
      gather(variable, value, a, b, d) %>%
      unite(level_variable, level, variable) %>%
      spread(level_variable, value)
    
    #> # A tibble: 26 x 12
    #>       ID        p         c     high_a     high_b     high_d      low_a
    #>  * <chr>    <dbl>     <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
    #>  1     a 358.8198 0.3517979 0.54406602 0.79892485  79.011379 0.54406602
    #>  2     b 809.4746 0.1111354 0.59414202 0.12189926   9.382306 0.59414202
    #>  3     c 468.0792 0.2436195 0.28915974 0.56094798  26.635096 0.28915974
    #>  4     d 894.7157 0.6680556 0.14711365 0.20653139  26.151654 0.14711365
    #>  5     e 946.4206 0.4176468 0.96302423 0.12753165  69.905442 0.96302423
    #>  6     f 141.0008 0.7881958 0.90229905 0.75330786 108.778072 0.90229905
    #>  7     g 575.2949 0.1028646 0.69070528 0.89504536  52.681362 0.69070528
    #>  8     h 903.1771 0.4348927 0.79546742 0.37446278 168.480110 0.79546742
    #>  9     i 596.2915 0.9849570 0.02461368 0.66511519  13.845603 0.02461368
    #> 10     j 510.9533 0.8930511 0.47779597 0.09484066  29.775361 0.47779597
    #> # ... with 16 more rows, and 5 more variables: low_b <dbl>, low_d <dbl>,
    #> #   med_a <dbl>, med_b <dbl>, med_d <dbl>
    
    0 讨论(0)
  • 2021-01-26 00:42

    Consider a vectorized approach (forgive me for non-tidyverse data wrangling) where all new columns can be handled in one call. Use seed(888) before random data to reproduce output:

    f1 <- function(df, vars) {
      df[paste0("d.", vars)] <- df$p * df[vars] * df$c
      return(df)
    }
    
    newpracdf <- f1(pracdf, c("low.a","high.a","med.a","med.b","low.b","high.b"))
    

    Output

    # # A tibble: 26 x 15
    #       ID        p      med.a      med.b          c      low.a     low.b    high.a    high.b    d.low.a   d.high.a    d.med.a    d.med.b    d.low.b   d.high.b
    #    <chr>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>     <dbl>     <dbl>     <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
    #  1     a 122.9573 0.65746601 0.43123587 0.81314570 0.52597281 0.3449887 0.7889592 0.5174830  52.587917  78.881876  65.734897  43.115909  34.492727  51.739091
    #  2     b 412.0127 0.19793909 0.77148952 0.26039116 0.15835127 0.6171916 0.2375269 0.9257874  16.988630  25.482945  21.235787  82.768834  66.215068  99.322601
    #  3     c 155.1248 0.30834064 0.99850558 0.57853922 0.24667251 0.7988045 0.3700088 1.1982067  22.137823  33.206735  27.672279  89.611689  71.689351 107.534027
    #  4     d 715.3769 0.85517040 0.81715464 0.84196723 0.68413632 0.6537237 1.0262045 0.9805856 412.071636 618.107455 515.089546 492.191742 393.753393 590.630090
    #  5     e 790.5284 0.12617255 0.59290522 0.54879020 0.10093804 0.4743242 0.1514071 0.7114863  43.790379  65.685568  54.737973 257.222588 205.778070 308.667105
    #  6     f 193.6968 0.15173488 0.93054996 0.08587380 0.12138791 0.7444400 0.1820819 1.1166600   2.019104   3.028655   2.523879  15.478286  12.382629  18.573943
    #  7     g 451.6000 0.88123996 0.62858787 0.12546384 0.70499197 0.5028703 1.0574880 0.7543054  39.944473  59.916709  49.930591  35.615457  28.492365  42.738548
    #  8     h 342.3741 0.09952918 0.56932309 0.10980862 0.07962334 0.4554585 0.1194350 0.6831877   2.993489   4.490234   3.741861  21.404056  17.123245  25.684867
    #  9     i 143.9489 0.42407685 0.94929822 0.02754267 0.33926148 0.7594386 0.5088922 1.1391579   1.345083   2.017624   1.681353   3.763718   3.010975   4.516462
    # 10     j 911.8069 0.25822441 0.08934875 0.55244369 0.20657953 0.0714790 0.3098693 0.1072185 104.058645 156.087967 130.073306  45.006930  36.005544  54.008316
    # # ... with 16 more rows
    
    0 讨论(0)
提交回复
热议问题