问题
I have dataset containing daily closing prices of 5413 companies from 2000 to 2014. I want to calculate daily log returns for the stocks as according to dates as log(Price today/Price yesterday). I illustrate the dataset as follows:
Date A G L ABA ABB ABBEY
2000-1-3 NA NA NA NA
2000-1-4 79.5 325 NA 961
2000-1-5 79.5 322.5 NA 945
2000-1-6 79.5 327.5 NA 952
2000-1-7 NA 327.5 NA 941
2000-1-10 79.5 327.5 NA 946
2000-1-11 79.5 327.5 NA 888
How could calculate the the daily log returns and additionally tackle the NA. My sample period is from 2000 to 2014 so there are some companies who were listed in year 2001,so, for the whole year 2000 they have NA, how should this be handled. Your help is highly appreciated.
回答1:
According to this topic I'll try to describe my proposition:
What I understand is that we have got a dataframe of dates and thousands of companies. Here's our example dataframe called prices:
> prices
newdates nsp1 nsp2 nsp3 nsp4
1 2000-01-03 NA NA NA NA
2 2000-01-04 79.5 325.0 NA 961
3 2000-01-05 79.5 322.5 NA 945
4 2000-01-06 79.5 327.5 NA 952
5 2000-01-07 NA 327.5 NA 941
6 2000-01-10 79.5 327.5 NA 946
7 2000-01-11 79.5 327.5 NA 888
To create a new dataframe of log-returns I used below code:
logs=data.frame(
+ cbind.data.frame(
+ newdates[-1],
+ diff(as.matrix(log(prices[,-1])))
+ )
+ )
> logs
newdates..1. nsp1 nsp2 nsp3 nsp4
1 2000-01-04 NA NA NA NA
2 2000-01-05 0 -0.007722046 NA -0.016789481
3 2000-01-06 0 0.015384919 NA 0.007380107
4 2000-01-07 NA 0.000000000 NA -0.011621895
5 2000-01-10 NA 0.000000000 NA 0.005299429
6 2000-01-11 0 0.000000000 NA -0.063270826
To clarify what is going on in this code lets analyze it from the inside out:
Step 1: Calculating log-returns
- You know that
log(a/b) = log(a)-log(b)
, so we can calculate differences of logarithms. Funcitiondiff(x,lag=1)
calculates differences with given lag. Here it islag=1
so it gives first differences. - Our
x
are prices in dataframe. Do pick from adata.frame
every columns without the first (there are dates) we useprices[,-1]
. - We need logarithms, so
log(prices[,-1])
- Function
diff()
works with vector or matrix, so we need to treat calculated logarithms as matrix, thus `as.matrix(log(prices[,-1])) - Now we can use
diff()
withlag=1
, sodiff(as.matrix(log(prices[,-1])))
Step 2: Creating dataframe of log-returns and dates
We can't use just
cbind()
. Firstly, because lengths are different (returns are shorter by 1 record). We need to remove first date, sonewdates[-1]
Secondly, using cbind() dates will be transformed into numeric values such 160027 or other.
Here we have to usecbind.data.frame(x,y)
, as seen above.Now data is ready and we can create use a
data.frame()
and name it as logs sologs=data.frame(...)
as above.
If your dataset look like dataframe prices it should run. Most important thing is to use diff(log(x))
to easily calculate log-returns.
If you have any questions or problem, then just ask.
来源:https://stackoverflow.com/questions/34514240/log-returns-of-multiple-securities-for-multiple-time-period-in-r