Calculate Percentage Change in R using dplyr

前端 未结 2 1529
独厮守ぢ
独厮守ぢ 2021-02-04 19:20

I want to calculate the percentage of Profit by YEAR which is a fairly simple task but somehow I am getting NA. I have checked same questi

相关标签:
2条回答
  • 2021-02-04 19:46

    Assuming that your Profit column represents the profit in a given year, this function will calculate the difference between year n and year n-1, divide by the value of year n-1, and multiply by 100 to get a percentage. If the value in year n-1 was zero, there is no valid percent change. It is important that you group the data only by VERTICAL and not by YEAR as well.

    profit_pct_change <- function(x) {
      x <- x[order(x$YEAR, decreasing = TRUE), ] # Confirms ordered by decreasing year
      pct_change <- -diff(x$Profit)/x$Profit[-1] * 100 # Gets percent change in profit from preceding year
      data.frame(year = x$YEAR[-length(x$YEAR)], pct_change = pct_change) # Returns data frame
    }
    
    df_vertical_growth %>% 
      group_by(VERTICAL) %>%
      do(profit_pct_change(.))
    
    0 讨论(0)
  • 2021-02-04 19:49

    The problem lies in the fact each group has one observation. One unique year per Vertical. What is the lag of one observation? Additionally since the years go in descending order I trust you need lead.

    library(tidyverse)
    z %>%
      group_by(VERTICAL) %>% 
      mutate(pct_change = (Profit/lead(Profit) - 1) * 100)
    #output
        YEAR VERTICAL       Profit pct_change
       <int> <fctr>          <int>      <dbl>
     1  2017 AGRICULTURE         0    -100   
     2  2016 AGRICULTURE   2053358     Inf   
     3  2015 AGRICULTURE         0    -100   
     4  2014 AGRICULTURE   2370747    - 41.7 
     5  2013 AGRICULTURE   4066693      NA   
     6  2017 COMMUNICATION       0    -100   
     7  2016 COMMUNICATION 1680074      27.0 
     8  2015 COMMUNICATION 1322470    -  9.43
     9  2014 COMMUNICATION 1460133    -  4.56
    10  2013 COMMUNICATION 1529863      NA   
    

    This solution assumes the years are arranged in the correct order, to make sure:

    z %>%
      group_by(VERTICAL) %>% 
      arrange(YEAR, .by_group = TRUE) %>%
      mutate(pct_change = (Profit/lag(Profit) - 1) * 100)
    #output
        YEAR VERTICAL       Profit pct_change
       <int> <fctr>          <int>      <dbl>
     1  2013 AGRICULTURE   4066693      NA   
     2  2014 AGRICULTURE   2370747    - 41.7 
     3  2015 AGRICULTURE         0    -100   
     4  2016 AGRICULTURE   2053358     Inf   
     5  2017 AGRICULTURE         0    -100   
     6  2013 COMMUNICATION 1529863      NA   
     7  2014 COMMUNICATION 1460133    -  4.56
     8  2015 COMMUNICATION 1322470    -  9.43
     9  2016 COMMUNICATION 1680074      27.0 
    10  2017 COMMUNICATION       0    -100   
    

    or use

    arrange(desc(YEAR), .by_group = TRUE)
    

    and lead

    z is:

    structure(list(YEAR = c(2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 
    2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 
    2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 
    2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 
    2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 
    2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 
    2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L
    ), VERTICAL = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
    2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
    6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L, 
    9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 
    11L, 12L, 12L, 12L, 12L, 12L), .Label = c("AGRICULTURE", "COMMUNICATION", 
    "CONSTRUCTION", "EDUCATION", "HEALTHCARE", "HOSPITALITY", "MANUFACTURING", 
    "MINING", "OTHER", "SERVICE", "TRANSPORTATION", "UTILITY"), class = "factor"), 
        Profit = c(0L, 2053358L, 0L, 2370747L, 4066693L, 0L, 1680074L, 
        1322470L, 1460133L, 1529863L, 0L, 0L, 0L, 8250149L, 0L, 0L, 
        12497015L, 13437356L, 10856685L, 13881127L, 0L, 0L, 0L, 4554364L, 
        5078130L, 0L, 4445512L, 5499419L, 9060639L, 4391522L, 0L, 
        0L, 0L, 0L, 27466974L, 0L, 4359251L, 4163201L, 6272530L, 
        6668191L, 0L, 0L, 0L, 5935199L, 3585969L, 0L, 0L, 0L, 0L, 
        28018522L, 0L, 0L, 0L, 0L, 8430244L, 0L, 3551989L, 6535248L, 
        3995486L, 4477617L)), .Names = c("YEAR", "VERTICAL", "Profit"
    ), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
    "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", 
    "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", 
    "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", 
    "48", "49", "50", "61", "62", "63", "64", "65", "66", "67", "68", 
    "69", "70", "71", "72", "73", "74", "75"), class = "data.frame")
    
    0 讨论(0)
提交回复
热议问题