I am having trouble with some calculations on a data frame.
Here is my DF (with many more rows and columns)
What I am trying to do is:
Step (1) - For e
We can first separate "Assets"
and "Returns"
columns and then use colSums
and multiply values accordingly
asset_col <- grep("^Assets", names(df1))
return_col <- grep("^Returns", names(df1))
colSums(t(t(df1[asset_col])/colSums(df1[asset_col])) * df1[return_col])
#Returns_Jan_2018 Returns_Feb_2018
# 3.504230 4.633941
To break it down and make clear each step
Step 1 - For each month I would like to sum the assets columns
colSums(df1[asset_col])
#Assets_Jan_2018 Assets_Feb_2018
# 1466742 2049689
Step 2 - For each firm, I would like to divide assets each month by the total for the month
t(t(df1[asset_col])/colSums(df1[asset_col]))
# Assets_Jan_2018 Assets_Feb_2018
#[1,] 0.14333400 0.11485889
#[2,] 0.08395751 0.06202112
#[3,] 0.61217106 0.38532577
#[4,] 0.16053744 0.43779422
Step 3 - Then I would like to take the values from step (2) and multiply by the corresponding returns
t(t(df1[asset_col])/colSums(df1[asset_col])) * df1[return_col]
# Returns_Jan_2018 Returns_Feb_2018
#1 0.6450030 0.76955455
#2 0.4449748 0.07442534
#3 0.8570395 2.38901980
#4 1.5572131 1.40094151
Step 4 - I would like to sum each column in step (3)
colSums(t(t(df1[asset_col])/colSums(df1[asset_col])) * df1[return_col])
#Returns_Jan_2018 Returns_Feb_2018
# 3.504230 4.633941
Probably a lot less concise then Ronak's answer, but demonstrates the sweep functionality.
Construct example df:
df <- data.frame(cbind(Firm = c("A", "B", "C", "D"),
Assets_Jan_2018 = as.numeric(c(210234, 123144, 897897, 235467)),
Assets_Feb_2018 = as.numeric(c(235425, 127124, 789798, 897342)),
Returns_Jan_2018 = as.double(c(4.5, 5.3, 1.4, 9.7)),
Returns_Feb_2019 = as.double(c(6.7, 1.2, 6.2, 3.2))))
Ensure numeric data types:
df <- type.convert(df)
Calculate the weighted returns:
FirmWeightedReturns <- cbind(Firm = df$Firm,
sweep(df[sapply(df, is.numeric) & !(grepl("returns", tolower(colnames(df))))],
2,
as.numeric(sapply(df[sapply(df, is.numeric) & !(grepl("returns", tolower(colnames(df))))], sum)), '/')
* df[grepl("returns", tolower(colnames(df)))])
Summarise as a df:
TotalReturns <- data.frame(lapply(FirmWeightedReturns[sapply(FirmWeightedReturns, is.numeric)], sum))
Output to console:
TotalReturns