问题
I am trying to do a rowSum
for the actuals columns. However, I would like to include the values up to the UpTo
date for certain observations. Here is the data frame:
dat <- structure(list(Company = c("ABC", "DEF", "XYZ"), UpTo = c(NA,
"Q2", "Q3"), Actual.Q1 = c(100L, 80L, 100L), Actual.Q2 = c(50L,
75L, 50L), Forecast.Q3 = c(80L, 50L, 80L), Forecast.Q4 = c(90L,
80L, 100L)), .Names = c("Company", "UpTo", "Actual.Q1", "Actual.Q2",
"Forecast.Q3", "Forecast.Q4"), class = "data.frame", row.names = c("1",
"2", "3"))
Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4
1 ABC NA 100 50 80 90
2 DEF Q2 80 75 50 80
3 XYZ Q3 100 50 80 100
- For company
ABC
, since there is noUpTo
date, it will just beActual.Q1
+Actual.Q2
, which is 150. - For company
DEF
, since theUpTo
date isQ2
, it will beActual.Q1
+Actual.Q2
, which is 155. - For company
XYZ
, since theUpTo
date isQ3
, it will beActual.Q1
+Actual.Q2
+Forecast.Q3
, which is 230.
The resulting data frame would look like this:
Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4 SumRecent
1 ABC NA 100 50 80 90 150
2 DEF Q2 80 75 50 80 155
3 XYZ Q3 100 50 80 100 230
I have tried to use the rowSums
function. However, it does not take into effect the variable UpTo
. Any help is appreciated. Thanks!
回答1:
Here is a possibility:
df$SumRecent <- sapply(1:nrow(df), function(x) {sum(df[x,3:ifelse(is.na(grep(df[x,2], colnames(df))[1]), 4, grep(df[x,2], colnames(df))[1])])})
# Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4 SumRecent
# 1 ABC <NA> 100 50 80 90 150
# 2 DEF Q2 80 75 50 80 155
# 3 XYZ Q3 100 50 80 100 230
We are looking with the use of grep
for a match of the value in the column UpTo
(df[x,2]
) in the column names of df
(colnames(df)
). If we find it we get the sum, if we don't find it we just sum the values in columns 3 and 4.
回答2:
We can use binary weighted row sums.
UpTo <- as.character(dat$UpTo) ## in case you have factor column
UpTo[is.na(UpTo)] <- "Q2" ## replace `NA` to "Q2"
w <- outer(as.integer(substr(UpTo, 2, 2)), 1:4, ">=")
# [,1] [,2] [,3] [,4]
#[1,] TRUE TRUE FALSE FALSE
#[2,] TRUE TRUE FALSE FALSE
#[3,] TRUE TRUE TRUE FALSE
We have a logical matrix. But it does not affect arithmetic computation as TRUE
is 1 and FALSE
is 0. Then we do weighted row sums:
X <- data.matrix(dat[3:6])
dat$SumRecent <- rowSums(X * w)
# Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4 SumRecent
#1 ABC <NA> 100 50 80 90 150
#2 DEF Q2 80 75 50 80 155
#3 XYZ Q3 100 50 80 100 230
The advantage of this approach is its speed / efficiency, as it is fully vectorized. This method is super fast. You can refer to the benchmark result in Fast way to create a binary matrix with known number of 1 each row in R.
回答3:
This should also work:
df$UpTo <- as.character(df$UpTo)
df$SumRecent <- apply(df, 1, function(x) ifelse(is.na(x[2]), sum(as.integer(x[3:4])),
sum(as.integer(x[3:(grep(x[2], names(df)))]))))
df
# Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4 SumRecent
#1 ABC <NA> 100 50 80 90 150
#2 DEF Q2 80 75 50 80 155
#3 XYZ Q3 100 50 80 100 230
回答4:
Another approach using data table:
require(data.table)
dat <- fread('Company UpTo Actual.Q1 Actual.Q2 Forecast.Q3 Forecast.Q4
ABC NA 100 50 80 90
DEF Q2 80 75 50 80
XYZ Q3 100 50 80 100')
dat[, SumRecent:= ifelse(is.na(UpTo), Actual.Q1 + Actual.Q2,
sum(.SD[, grepl(paste0("Q[1-", substring(UpTo, 2), "]$"), names(.SD)), with = F]) ), by = Company]
来源:https://stackoverflow.com/questions/40771358/ragged-rowsums-in-r