Compute all fixed window averages with dplyr and RcppRoll

前端 未结 2 877
囚心锁ツ
囚心锁ツ 2021-01-28 17:03

I would like to compute all (or at least many) fixed window averages using dplyr and RcppRoll. For example, if I want to compute the average wind speed from the storms

相关标签:
2条回答
  • 2021-01-28 17:29

    Just use the power of quoting and unquoting! That's what you have:

    library(dplyr)
    library(RcppRoll)
    
    set.seed(1)
    storms <- storms[storms$name %in% sample(storms$name, size = 4),]
    
    storms_subset <- storms %>%
      select(name, year, month, day, hour, wind) %>%
      group_by(name) %>%
      arrange(name, year, month, day, hour) %>%
      mutate_at("wind", .funs = funs(
        "avg_4" = roll_meanr(., n = 4, fill = NA),
        "avg_5" = roll_meanr(., n = 5, fill = NA),
        "avg_6" = roll_meanr(., n = 6, fill = NA)
      ))
    

    Now let's make a function that builds a bunch of expressions like roll_meanr(x, n) for different xs and ns.

    make_rollmeans <- function(..., .n = 3) {
      # this line captures vars you typed in
      .dots <- rlang::exprs(...)
    
      # now you iterate over captured variables...
      q <- purrr::map(.dots, function(.var) {
        # ... and over window sizes
        purrr::map(.n, function(.nn) {
          # for each (variable, window) pair make an expression
          rlang::expr(RcppRoll::roll_meanr(!!.var, !!(.nn)))
        }) %>% 
          # set proper names by combining variable name, "avg", and window size
          purrr::set_names(paste0(as.character(.var), "_avg_", .n))
      }) %>%
        # and finally remove inner structure of list of expressions
        # after that you'll have a list of expressions with depth 1 
        purrr::flatten() 
      q
    }
    

    All the magic comes from rlang::expr(RcppRoll::roll_meanr(!!.var, !!(.nn))). With !!.var you substitute .var with input variable name, i.e. wind. With !!.nn you substitute .nn with number. Next, you quote the expression with rlang::expr(...).

    This function gets variable names without "" and vector of window sizes. Output looks like this:

    make_rollmeans(wind, pressure, .n = c(3, 5))
    #> $wind_avg_3
    #> RcppRoll::roll_meanr(wind, 3)
    #> 
    #> $wind_avg_5
    #> RcppRoll::roll_meanr(wind, 5)
    #> 
    #> $pressure_avg_3
    #> RcppRoll::roll_meanr(pressure, 3)
    #> 
    #> $pressure_avg_5
    #> RcppRoll::roll_meanr(pressure, 5)
    

    You can see expressions you are looking for.

    Next, you can put make_rollmeans inside mutate() call using !!! (bang-bang-bang) operator for unquoting expressions built by it.

    select(storms_subset, wind) %>% mutate(!!!make_rollmeans(wind, .n = 3:20))
    #> Adding missing grouping variables: `name`
    #> # A tibble: 261 x 20
    #> # Groups:   name [4]
    #>    name     wind wind_avg_3 wind_avg_4 wind_avg_5 wind_avg_6 wind_avg_7
    #>    <chr>   <int>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
    #>  1 Ernesto    30       NA         NA          NA        NA         NA  
    #>  2 Ernesto    30       NA         NA          NA        NA         NA  
    #>  3 Ernesto    30       30.0       NA          NA        NA         NA  
    #>  4 Ernesto    35       31.7       31.2        NA        NA         NA  
    #>  5 Ernesto    40       35.0       33.8        33.       NA         NA  
    #>  6 Ernesto    50       41.7       38.8        37.       35.8       NA  
    #>  7 Ernesto    60       50.0       46.2        43.       40.8       39.3
    #>  8 Ernesto    55       55.0       51.2        48.       45.0       42.9
    #>  9 Ernesto    50       55.0       53.8        51.       48.3       45.7
    #> 10 Ernesto    45       50.0       52.5        52.       50.0       47.9
    #> # ... with 251 more rows, and 13 more variables: wind_avg_8 <dbl>,
    #> #   wind_avg_9 <dbl>, wind_avg_10 <dbl>, wind_avg_11 <dbl>,
    #> #   wind_avg_12 <dbl>, wind_avg_13 <dbl>, wind_avg_14 <dbl>,
    #> #   wind_avg_15 <dbl>, wind_avg_16 <dbl>, wind_avg_17 <dbl>,
    #> #   wind_avg_18 <dbl>, wind_avg_19 <dbl>, wind_avg_20 <dbl>
    

    I hope the result is the same as you are asked for. :)

    0 讨论(0)
  • 2021-01-28 17:36

    Using Base R, I hope it help:

    storms_wind <- storms %>%
        select(name, year, month, day, hour, wind) %>%
        group_by(name) %>%
        arrange(name, year, month, day, hour)
    
    multi_avg <- function(df, start, end) {
                     for(i in (strat:end)){
                     varname <- paste("avg", i , sep="_")
                     df[[varname]] <- with(df, roll_meanr(wind, n = i, fill = NA))
                    }
                 df
               }
    
    
    multi_avg(df=storms_wind, start=4,end=20) 
    
    0 讨论(0)
提交回复
热议问题