Sum only numerical columns and divide values by total

前端 未结 2 1313
不思量自难忘°
不思量自难忘° 2021-01-24 18:27

I am having trouble with some calculations on a data frame.

Here is my DF (with many more rows and columns)

What I am trying to do is:

Step (1) - For e

相关标签:
2条回答
  • 2021-01-24 19:01

    We can first separate "Assets" and "Returns" columns and then use colSums and multiply values accordingly

    asset_col <- grep("^Assets", names(df1))
    return_col <- grep("^Returns", names(df1))
    colSums(t(t(df1[asset_col])/colSums(df1[asset_col])) * df1[return_col])
    
    #Returns_Jan_2018 Returns_Feb_2018 
    #        3.504230         4.633941 
    

    To break it down and make clear each step

    Step 1 - For each month I would like to sum the assets columns

    colSums(df1[asset_col])
    #Assets_Jan_2018 Assets_Feb_2018 
    #    1466742         2049689 
    

    Step 2 - For each firm, I would like to divide assets each month by the total for the month

    t(t(df1[asset_col])/colSums(df1[asset_col]))
    #     Assets_Jan_2018 Assets_Feb_2018
    #[1,]      0.14333400      0.11485889
    #[2,]      0.08395751      0.06202112
    #[3,]      0.61217106      0.38532577
    #[4,]      0.16053744      0.43779422
    

    Step 3 - Then I would like to take the values from step (2) and multiply by the corresponding returns

    t(t(df1[asset_col])/colSums(df1[asset_col])) * df1[return_col]
    
    #  Returns_Jan_2018 Returns_Feb_2018
    #1        0.6450030       0.76955455
    #2        0.4449748       0.07442534
    #3        0.8570395       2.38901980
    #4        1.5572131       1.40094151
    

    Step 4 - I would like to sum each column in step (3)

    colSums(t(t(df1[asset_col])/colSums(df1[asset_col])) * df1[return_col])
    
    #Returns_Jan_2018 Returns_Feb_2018 
    #        3.504230         4.633941 
    
    0 讨论(0)
  • 2021-01-24 19:12

    Probably a lot less concise then Ronak's answer, but demonstrates the sweep functionality.

    Construct example df:

    df <- data.frame(cbind(Firm = c("A", "B", "C", "D"),
                           Assets_Jan_2018 = as.numeric(c(210234, 123144, 897897, 235467)),
                           Assets_Feb_2018 = as.numeric(c(235425, 127124, 789798, 897342)),
                           Returns_Jan_2018 = as.double(c(4.5,  5.3,  1.4, 9.7)),
                           Returns_Feb_2019 = as.double(c(6.7, 1.2, 6.2, 3.2))))
    

    Ensure numeric data types:

    df <- type.convert(df)
    

    Calculate the weighted returns:

    FirmWeightedReturns <- cbind(Firm = df$Firm,
                                 sweep(df[sapply(df, is.numeric) & !(grepl("returns", tolower(colnames(df))))],
                                       2,
                                       as.numeric(sapply(df[sapply(df, is.numeric) & !(grepl("returns", tolower(colnames(df))))], sum)), '/')
                                      * df[grepl("returns", tolower(colnames(df)))])
    

    Summarise as a df:

    TotalReturns <- data.frame(lapply(FirmWeightedReturns[sapply(FirmWeightedReturns, is.numeric)], sum))
    

    Output to console:

    TotalReturns
    
    0 讨论(0)
提交回复
热议问题