问题
I have a time series data frame and want to compute cumulative returns for stock symbols intra-day for a range of dates. When the symbol and/or date changes the cumulative return should reset. Any help would be appreciated. A small sample of my data frame is below including what the cumulative sum column should return. Thanks.
Date Symbol Time Last Return Cumulative.Sum
1 1/2/2013 AA 9:30 42.00 n/a n/a
2 1/2/2013 AA 12:00 42.50 1.19% 1.19%
3 1/2/2013 AA 16:00 42.88 0.89% 2.08%
4 1/2/2013 AAPL 9:30 387.00 n/a n/a
5 1/2/2013 AAPL 12:00 387.87 0.22% 0.22%
6 1/2/2013 AAPL 16:00 388.69 0.21% 0.44%
7 1/3/2013 AA 9:30 42.88 n/a n/a
8 1/3/2013 AA 12:00 42.11 -1.80% -1.80%
9 1/3/2013 AA 16:00 41.89 -0.52% -2.32%
回答1:
using the data.table
package this is trivial. If your data is in a data.frame
called dat
:
library(data.table)
DT <- data.table(dat)
DT[, your_cumsum_function(.SD), by=c('Date', 'Symbol')]
Where .SD
is the subset of the data.table
defined by the by
groups. See ?data.table
for more information.
You can also pass column names directly:
DT[, your_cumsum_function(Last), by=c('Date', 'Symbol')]
In your particular example, do:
DT[, Return := as.numeric(sub('%$', '', Return))]
DT[!is.na(Return), Cumulative.Sum := cumsum(Return), by = c('Date', 'Symbol')]
回答2:
This is a typical case for the split-apply-combine strategy: You split your data.frame
by unique combinations of certain columns (Date and Symbol), apply a procedure on the subset (cumsum
on Return ) and combine the subsets back to a large data.frame
. This can be achieved easily with ddply
from the plyr
package:
mdf$Return <- as.numeric(sub( "(\\d+\\.\\d+)\\%", "\\1", mdf$Return ))
mdf$Return[ is.na(mdf$Return) ] <- 0
library(plyr)
ddply(mdf, .(Date,Symbol), transform, Cumulative.Sum = cumsum(Return))
Date Symbol Time Last Return Cumulative.Sum
1 1/2/2013 AA 9:30 42.00 0.00 0.00
2 1/2/2013 AA 12:00 42.50 1.19 1.19
3 1/2/2013 AA 16:00 42.88 0.89 2.08
4 1/2/2013 AAPL 9:30 387.00 0.00 0.00
5 1/2/2013 AAPL 12:00 387.87 0.22 0.22
6 1/2/2013 AAPL 16:00 388.69 0.21 0.43
7 1/3/2013 AA 9:30 42.88 0.00 0.00
8 1/3/2013 AA 12:00 42.11 -1.80 -1.80
9 1/3/2013 AA 16:00 41.89 -0.52 -2.32
回答3:
Sample data (note: I used lubridate
library just to call the dmy
function)
library(lubridate)
df <- data.frame(
Date = dmy( c( "1/2/2013", "1/2/2013", "1/2/2013", "1/2/2013"
, "1/2/2013", "1/2/2013", "1/3/2013", "1/3/2013", "1/3/2013" ) ),
Symbol = c( "AA", "AA", "AA", "AAPL", "AAPL", "AAPL", "AA", "AA", "AA" ),
Return = c( NA, 1.19, 0.89, NA, 0.22, 0.21, NA, -1.80, -0.52 )
)
Now, using dplyr
, you can group_by
your dataframe and create the desired column Cum_Sum
:
library(dplyr)
df %>% group_by(Date, Symbol) %>%
mutate( Return_aux = ifelse( is.na(Return), 0, Return ), #remove NA
Cum_Sum = cumsum(Return_aux) )
# A tibble: 9 x 5
# Groups: Date, Symbol [3]
Date Symbol Return Return_aux Cum_Sum
<date> <fct> <dbl> <dbl> <dbl>
1 2013-02-01 AA NA 0 0
2 2013-02-01 AA 1.19 1.19 1.19
3 2013-02-01 AA 0.89 0.89 2.08
4 2013-02-01 AAPL NA 0 0
5 2013-02-01 AAPL 0.22 0.22 0.22
6 2013-02-01 AAPL 0.21 0.21 0.43
7 2013-03-01 AA NA 0 0
8 2013-03-01 AA -1.8 -1.8 -1.8
9 2013-03-01 AA -0.52 -0.52 -2.32
来源:https://stackoverflow.com/questions/16741683/conditional-cumulative-sum-in-r