How can I calculate the percentage change within a group for multiple columns in R?

后端 未结 2 605
误落风尘
误落风尘 2021-02-05 11:20

I have a data frame with an ID column, a date column (12 months for each ID), and I have 23 numeric variables. I would like to obtain the percentage change by month within each

相关标签:
2条回答
  • 2021-02-05 11:32

    The issue you are running into is because your data is not formatted in a "tidy" way. You have observations (V1:V3) that are in columns creating a "wide" data frame. The "tidyverse" works best with long format. The good news is with the gather() function you can get exactly what you need. Here's a solution using the "tidyverse".


    library(tidyverse)
    
    # Recreate data set
    df <- tribble(
        ~ID, ~Date, ~V1, ~V2, ~V3,
        1,  "Jan",   2,  3,  5,
        1,  "Feb",   3,  4,  6,
        1,  "Mar",   7,  8,  9,
        2,  "Jan",   1,  1,  1,
        2,  "Feb",   2,  3,  4,
        2,  "Mar",   7,  8,  8
    )
    df
    #> # A tibble: 6 × 5
    #>      ID  Date    V1    V2    V3
    #>   <dbl> <chr> <dbl> <dbl> <dbl>
    #> 1     1   Jan     2     3     5
    #> 2     1   Feb     3     4     6
    #> 3     1   Mar     7     8     9
    #> 4     2   Jan     1     1     1
    #> 5     2   Feb     2     3     4
    #> 6     2   Mar     7     8     8
    
    # Gather and calculate percent change
    df %>%
        gather(key = key, value = value, V1:V3) %>%
        group_by(ID, key) %>%
        mutate(lag = lag(value)) %>%
        mutate(pct.change = (value - lag) / lag)
    #> Source: local data frame [18 x 6]
    #> Groups: ID, key [6]
    #> 
    #>       ID  Date   key value   lag pct.change
    #>    <dbl> <chr> <chr> <dbl> <dbl>      <dbl>
    #> 1      1   Jan    V1     2    NA         NA
    #> 2      1   Feb    V1     3     2  0.5000000
    #> 3      1   Mar    V1     7     3  1.3333333
    #> 4      2   Jan    V1     1    NA         NA
    #> 5      2   Feb    V1     2     1  1.0000000
    #> 6      2   Mar    V1     7     2  2.5000000
    #> 7      1   Jan    V2     3    NA         NA
    #> 8      1   Feb    V2     4     3  0.3333333
    #> 9      1   Mar    V2     8     4  1.0000000
    #> 10     2   Jan    V2     1    NA         NA
    #> 11     2   Feb    V2     3     1  2.0000000
    #> 12     2   Mar    V2     8     3  1.6666667
    #> 13     1   Jan    V3     5    NA         NA
    #> 14     1   Feb    V3     6     5  0.2000000
    #> 15     1   Mar    V3     9     6  0.5000000
    #> 16     2   Jan    V3     1    NA         NA
    #> 17     2   Feb    V3     4     1  3.0000000
    #> 18     2   Mar    V3     8     4  1.0000000
    
    0 讨论(0)
  • 2021-02-05 11:36

    How about using pct <- function(x) x/lag(x)? (or (x/lag(x)-1)*100, or however you wish to specify pct change exactly) e.g.,

    pct(1:3)
    [1]  NA 2.0 1.5
    

    Edit: Adding Frank's suggestion

    pct <- function(x) {x/lag(x)}
    
    dt %>% group_by(ID) %>% mutate_each(funs(pct), c(V1, V2, V3))
    
    ID Date       V1       V2  V3
    1  Jan       NA       NA  NA
    1  Feb 1.500000 1.333333 1.2
    1  Mar 2.333333 2.000000 1.5
    2  Jan       NA       NA  NA
    2  Feb 2.000000 3.000000 4.0
    2  Mar 3.500000 2.666667 2.0
    
    0 讨论(0)
提交回复
热议问题