Convert Dataframe to make Waterfall Chart in ggplot2

老子叫甜甜 提交于 2019-12-06 07:28:44

There are a few steps to get you to this, and I think that the dplyr package will help (used heavily below).

My understanding is that revenue gives the cumulative total revenue, rather than the daily change. If that is wrong, you would need to reverse some of these calculations.

The first step is to create a new data.frame that calculates the daily totals, then bind that back to the data.frame. Then, you can group_by the employees (including "Total") and add columns that will be created separately for each employee (value on the previous day, the change, and then whether it was an increase or a decrease).

toPlot <-
  bind_rows(
    df
    , df %>%
      group_by(date) %>%
      summarise(revenue = sum(revenue)) %>%
      mutate(employee = "Total") 
  ) %>%
  group_by(employee) %>%
  mutate(
    previousDay = lag(revenue, default = 0) 
    , change = revenue - previousDay
    , direction = ifelse(change > 0
                         , "Positive"
                         , "Negative"))

returns:

         date employee revenue previousDay change direction
       <date>    <chr>   <dbl>       <dbl>  <dbl>     <chr>
1  2017-03-01        A      10           0     10  Positive
2  2017-03-01        B      20           0     20  Positive
3  2017-03-01        C      30           0     30  Positive
4  2017-03-01        D      40           0     40  Positive
5  2017-03-01        E      10           0     10  Positive
6  2017-03-01        F      40           0     40  Positive
7  2017-03-02        A       8          10     -2  Negative
8  2017-03-02        B      10          20    -10  Negative
9  2017-03-02        C      20          30    -10  Negative
10 2017-03-02        D      50          40     10  Positive
# ... with 18 more rows

Then, we can plot that using:

toPlot %>%
  ggplot(aes(xmin = date - 0.5
             , xmax = date + 0.5
             , ymin = previousDay
             , ymax = revenue
             , fill = direction)) +
  geom_rect(col = "black"
            , show.legend = FALSE) +
  facet_wrap(~employee
             , scale = "free_y") +
  scale_fill_brewer(palette = "Set1")

to give

Note that including "Total" throws off the scale (requiring the free scales), so I would prefer to omit it:

toPlot %>%
  filter(employee != "Total") %>%
  ggplot(aes(xmin = date - 0.5
             , xmax = date + 0.5
             , ymin = previousDay
             , ymax = revenue
             , fill = direction)) +
  geom_rect(col = "black"
            , show.legend = FALSE) +
  facet_wrap(~employee) +
  scale_fill_brewer(palette = "Set1")

For this to allow direct comparsion between employees

and this for the overall total

toPlot %>%
  filter(employee == "Total") %>%
  ggplot(aes(xmin = date - 0.5
             , xmax = date + 0.5
             , ymin = previousDay
             , ymax = revenue
             , fill = direction)) +
  geom_rect(col = "black"
            , show.legend = FALSE) +
  scale_fill_brewer(palette = "Set1")

though I still find line graphs to be easier to interpret (especially comparing employees):

toPlot %>%
  filter(employee != "Total") %>%
  ggplot(aes(x = date
             , y = revenue
             , col = employee)) +
  geom_line() +
  scale_fill_brewer(palette = "Dark2")

If you want to plot the changes themselves by day, you can do:

toPlot %>%
  filter(employee != "Total") %>%
  ggplot(aes(x = date
             , y = change
             , fill = employee)) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Dark2")

to get:

but now you are getting rather far from the "waterfall" plot outputs. If you really, really want to make a waterfall comparable across plots you can, but it is going to be rather ugly (I'd strongly recommend the line plot above instead).

Here, you need to manually move the boxes around, and this will require some tinkering if you change the output aspect ratio (or size) or the number of employees. You also need to include colors for both the employee and the direction of the change, which starts to look rough. This falls into the category of "can, but probably shouldn't" -- there is likely a better way to display these data.

toPlot %>%
  filter(employee != "Total") %>%
  ungroup() %>%
  mutate(empNumber = as.numeric(as.factor(employee))) %>%
  ggplot(aes(xmin = (empNumber) - 0.4
             , xmax = (empNumber) + 0.4
             , ymin = previousDay
             , ymax = revenue
             , col = direction
             , fill = employee)) +
  geom_rect(size = 1.5) +
  facet_grid(~date) +
  scale_fill_brewer(palette = "Dark2") +
  theme(axis.text.x = element_blank()
        , axis.ticks.x = element_blank())

gives

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!