Disaggregate in the context of a time series

前端 未结 3 1302
遥遥无期
遥遥无期 2021-01-23 06:47

I have a dataset that I want to visualize overall and disaggregated by a few different variables. I created a flexdashboard with a toy shiny app to select the type of disaggrega

3条回答
  •  星月不相逢
    2021-01-23 07:24

    This is a good place to make a function, to shorten your code and make it less prone to error.

    http://r4ds.had.co.nz/functions.html

    A complicating bit is that programming with dplyr often requires wading into a framework called tidyeval, which is very powerful but can be intimidating. https://dplyr.tidyverse.org/articles/programming.html

    (Here's an alternative approach that sidesteps tidyeval: https://cran.r-project.org/web/packages/seplyr/vignettes/using_seplyr.html)

    In your scenario, it's possible to avoid these challenges entirely by doing a bit of manipulation before and after your function. It's not as elegant, but works.

    BTW, I can't guarantee it'll work since you didn't share a verifiable reprex (e.g. including a sample of data with the same form as yours), but it worked with the fake data I made up. (See bottom.) Sorry, I missed the chunk where your sample data was provided.

    prep_dat <- function(filtered_dat, col_name = "total") {
      filtered_dat %>%
        mutate(new = 1) %>%
        arrange(date) %>%
      # time series analysis
      tibbletime::as_tbl_time(index = date) %>% # convert to tibble time object
        select(date, new) %>%
        tibbletime::collapse_by("1 week", side = "start", clean = TRUE) %>%
        group_by(date) %>%
        mutate(total = sum(new, na.rm = TRUE)) %>%
        distinct(date, .keep_all = TRUE) %>%
        ungroup() %>%
        # expand matrix to include weeks without data
        complete(
          date = seq(date[1], date[length(date)], by = "1 week"),
          fill = list(total = 0)
        )
    }
    

    Then you could call it with your filtered data and the name of the total column. This fragment should be able to replace the ~20 lines you're currently using:

    males <- prep_dat(dat_fake %>% 
      filter(sex == "male")) %>% 
      rename("total_m" = "total")
    

    Fake data that I tested on:

    dat_fake <- tibble(
      date = as.Date("2018-01-01") + runif(500, 0, 100),
      new  = runif(500, 0, 100),
      sex  = sample(c("male", "female"), 
                    500, replace = TRUE),
      lang = sample(c("english", "french", "spanish", "portuguese", "tagalog"), 
                    500, replace = TRUE)
    )
    

提交回复
热议问题