Disaggregate in the context of a time series

前端 未结 3 1303
遥遥无期
遥遥无期 2021-01-23 06:47

I have a dataset that I want to visualize overall and disaggregated by a few different variables. I created a flexdashboard with a toy shiny app to select the type of disaggrega

相关标签:
3条回答
  • 2021-01-23 07:21

    I think you can make some gains by changing the order of your preparation. Right now the flow of your app is approximately:

    Data => prepare all combinations => select desired visualization => make plot

    Consider instead:

    Data => select desired visualization => prepare required combination => make plot

    This would make use of Shiny's reactivity to (re)prepare the data required for the requested plot in response to changes in the user's selection.

    By way of code snippets (Sorry, I don't have sufficient familiarity with flexdashboard and tibbletime to ensure this code runs, but I hope it is enough to highlight the approach):

    Your control selects the column you want to focus on (note we use "All" = "'1'" so this evaluates to a constant in the group-by, else it has to be handled separately):

    radioButtons("diss", label = "Disaggregation",
                 choices = list("All" = "'1'",
                                "By Sex" = "sex",
                                "By Language" = "lang",
                                "By other" = "column_name_of_'other'"), 
                 selected = 1)
    

    And then use this in your group by to prepare only the data required for the present visualization (you'll need to adjust the function suggested by @Jon_Spring in response to this earlier group-by):

    preped_dat = reactive({
      dat %>%
        group_by_(input$diss) %>%
        # etc
    })
    

    Before plotting (you'll need to adjust the plotting function in response to the possible change in data format):

    renderDygraph({
      totals = preped_data()
      dygraph(totals) %>%
          dySeries("total", label = ) %>%
          dyRangeSelector()
    })
    

    With regard to group_by you can use group_by_ if all your arguments are text strings, or group_by(!! sym(input$diss), other_column_name) if you want to mix the text string input from your control with other column names.

    One possible disadvantage of this change in approach is reduced responsiveness during interactivity if your data set is large. The present approach does all the computation up front and then minimal computation each selection - this may be preferable if you have a large amount of processing. My suggested approach will have minimal up front processing and moderate computation each selection.

    0 讨论(0)
  • 2021-01-23 07:23

    Thanks for explaining more about your goals. I think the approach @simon-s-a suggests will simplify things. If we can run the grouping dynamically, and structure it so that we don't need to know the possible components in those groups beforehand, it will be a lot easier to maintain.

    Here's a minimum viable product that rebuilds the plotting function to include the grouping logic inside it.

    1. Once grouped by date and whatever our grouping variable is, it counts how many rows each group has, then spreads those so each group gets a column.

    2. Then I use padr::pad to pad out any missing time rows in between, and replace all the NA's with zeros.

    3. Finally, that data frame is converted to an xts object and fed into dygraph, which seems to handle the multiple columns automatically.

    Here:

    ---
    title: "test"
    output: 
      flexdashboard::flex_dashboard:
        theme: bootstrap
    runtime: shiny
    ---
    
    ```{r setup, include=FALSE}
    library(flexdashboard)
    library(tidyverse)
    library(tibbletime)
    library(dygraphs)
    library(magrittr)
    library(xts)
    ```
    
    ```{r global, include=FALSE}
    # generate data
    set.seed(1)
    dat <- data.frame(date = seq(as.Date("2018-01-01"), 
                                 as.Date("2018-06-30"), 
                                 "days"),
                      sex = sample(c("male", "female"), 181, replace=TRUE),
                      lang = sample(c("english", "spanish"), 181, replace=TRUE),
                      age = sample(20:35, 181, replace=TRUE))
    dat <- dplyr::sample_n(dat, 80)
    ```
    
    Sidebar {.sidebar}
    =====================================
    
    ```{r}
    
    radioButtons("diss", label = "Disaggregation",
                 choices = list("All" = "Total",
                                "By Sex" = "sex",
                                "By Language" = "lang"), 
                 selected = "Total")
    ```
    
    Page 1
    =====================================
    
    ```{r plot}
    
    renderDygraph({
      grp_col <- rlang::sym(input$diss) # This converts the input selection to a symbol
    
      dat %>%
        mutate(Total = 1) %>% # This is a hack to let us "group" by Total -- all one group
    
        # Here's where we unquote the symbol so that dplyr can use it 
        #   to refer to a column. In this case I make a dummy column 
        #   that's a copy of whatever column we want to group
        mutate(my_group = !!grp_col) %>%
    
        # Now we make a group for every existing combination of week 
        #   (using lubridate::floor_date) and level of our grouping column,
        #   count how many rows in each group, and spread that to wide format.
        group_by(date = lubridate::floor_date(date, "1 week"), my_group) %>%
        count() %>% spread(my_group, n) %>% ungroup() %>%
    
        # padr:pad() fills in any missing weeks in the sequence with new rows
        #   Then we replace all the NA's with zeroes.
        padr::pad() %>% replace(is.na(.), 0) %>%
    
        # Finally we can convert to xts and feed the wide table into digraph.
        xts::xts(order.by = .$date) %>%
        dygraph() %>%
        dyRangeSelector() %>%
        dyOptions(
          useDataTimezone = FALSE, stepPlot = TRUE,
          drawGrid = FALSE, fillGraph = TRUE
        )
    })
    ```
    
    0 讨论(0)
  • 2021-01-23 07:24

    This is a good place to make a function, to shorten your code and make it less prone to error.

    http://r4ds.had.co.nz/functions.html

    A complicating bit is that programming with dplyr often requires wading into a framework called tidyeval, which is very powerful but can be intimidating. https://dplyr.tidyverse.org/articles/programming.html

    (Here's an alternative approach that sidesteps tidyeval: https://cran.r-project.org/web/packages/seplyr/vignettes/using_seplyr.html)

    In your scenario, it's possible to avoid these challenges entirely by doing a bit of manipulation before and after your function. It's not as elegant, but works.

    BTW, I can't guarantee it'll work since you didn't share a verifiable reprex (e.g. including a sample of data with the same form as yours), but it worked with the fake data I made up. (See bottom.) Sorry, I missed the chunk where your sample data was provided.

    prep_dat <- function(filtered_dat, col_name = "total") {
      filtered_dat %>%
        mutate(new = 1) %>%
        arrange(date) %>%
      # time series analysis
      tibbletime::as_tbl_time(index = date) %>% # convert to tibble time object
        select(date, new) %>%
        tibbletime::collapse_by("1 week", side = "start", clean = TRUE) %>%
        group_by(date) %>%
        mutate(total = sum(new, na.rm = TRUE)) %>%
        distinct(date, .keep_all = TRUE) %>%
        ungroup() %>%
        # expand matrix to include weeks without data
        complete(
          date = seq(date[1], date[length(date)], by = "1 week"),
          fill = list(total = 0)
        )
    }
    

    Then you could call it with your filtered data and the name of the total column. This fragment should be able to replace the ~20 lines you're currently using:

    males <- prep_dat(dat_fake %>% 
      filter(sex == "male")) %>% 
      rename("total_m" = "total")
    

    Fake data that I tested on:

    dat_fake <- tibble(
      date = as.Date("2018-01-01") + runif(500, 0, 100),
      new  = runif(500, 0, 100),
      sex  = sample(c("male", "female"), 
                    500, replace = TRUE),
      lang = sample(c("english", "french", "spanish", "portuguese", "tagalog"), 
                    500, replace = TRUE)
    )
    
    0 讨论(0)
提交回复
热议问题