Conditional Cumulative Sum in R

后端 未结 3 1643
无人共我
无人共我 2021-01-02 06:34

I have a time series data frame and want to compute cumulative returns for stock symbols intra-day for a range of dates. When the symbol and/or date changes the cumulative

相关标签:
3条回答
  • 2021-01-02 07:12

    Sample data (note: I used lubridate library just to call the dmy function)

    library(lubridate) 
    df <- data.frame(
      Date = dmy( c( "1/2/2013", "1/2/2013", "1/2/2013", "1/2/2013"
                     , "1/2/2013", "1/2/2013", "1/3/2013", "1/3/2013", "1/3/2013" ) ),
      Symbol = c( "AA", "AA", "AA", "AAPL", "AAPL", "AAPL", "AA", "AA", "AA" ),
      Return = c( NA, 1.19, 0.89, NA, 0.22, 0.21, NA, -1.80, -0.52 )
    )
    

    Now, using dplyr, you can group_by your dataframe and create the desired column Cum_Sum:

    library(dplyr)
    df %>% group_by(Date, Symbol) %>% 
      mutate( Return_aux = ifelse( is.na(Return), 0, Return ), #remove NA
              Cum_Sum = cumsum(Return_aux) )
    
    # A tibble: 9 x 5
    # Groups:   Date, Symbol [3]
      Date       Symbol Return Return_aux Cum_Sum
      <date>     <fct>   <dbl>      <dbl>   <dbl>
    1 2013-02-01 AA      NA          0       0   
    2 2013-02-01 AA       1.19       1.19    1.19
    3 2013-02-01 AA       0.89       0.89    2.08
    4 2013-02-01 AAPL    NA          0       0   
    5 2013-02-01 AAPL     0.22       0.22    0.22
    6 2013-02-01 AAPL     0.21       0.21    0.43
    7 2013-03-01 AA      NA          0       0   
    8 2013-03-01 AA      -1.8       -1.8    -1.8 
    9 2013-03-01 AA      -0.52      -0.52   -2.32
    
    0 讨论(0)
  • using the data.table package this is trivial. If your data is in a data.frame called dat:

    library(data.table)
    DT <- data.table(dat)
    
    DT[, your_cumsum_function(.SD), by=c('Date', 'Symbol')]
    

    Where .SD is the subset of the data.table defined by the by groups. See ?data.table for more information.

    You can also pass column names directly:

    DT[, your_cumsum_function(Last), by=c('Date', 'Symbol')]
    

    In your particular example, do:

    DT[, Return := as.numeric(sub('%$', '', Return))]
    DT[!is.na(Return), Cumulative.Sum := cumsum(Return), by = c('Date', 'Symbol')]
    
    0 讨论(0)
  • 2021-01-02 07:34

    This is a typical case for the split-apply-combine strategy: You split your data.frame by unique combinations of certain columns (Date and Symbol), apply a procedure on the subset (cumsum on Return ) and combine the subsets back to a large data.frame. This can be achieved easily with ddplyfrom the plyr package:

    mdf$Return <- as.numeric(sub( "(\\d+\\.\\d+)\\%", "\\1", mdf$Return ))
    mdf$Return[ is.na(mdf$Return) ] <- 0
    
    library(plyr)
    ddply(mdf, .(Date,Symbol), transform, Cumulative.Sum = cumsum(Return))
    
          Date Symbol  Time   Last Return Cumulative.Sum
    1 1/2/2013     AA  9:30  42.00   0.00           0.00
    2 1/2/2013     AA 12:00  42.50   1.19           1.19
    3 1/2/2013     AA 16:00  42.88   0.89           2.08
    4 1/2/2013   AAPL  9:30 387.00   0.00           0.00
    5 1/2/2013   AAPL 12:00 387.87   0.22           0.22
    6 1/2/2013   AAPL 16:00 388.69   0.21           0.43
    7 1/3/2013     AA  9:30  42.88   0.00           0.00
    8 1/3/2013     AA 12:00  42.11  -1.80          -1.80
    9 1/3/2013     AA 16:00  41.89  -0.52          -2.32
    
    0 讨论(0)
提交回复
热议问题