I feel this should be easy in base R but I just can't figure it out. I have a simple dataframe, let's say it looks like this
tbl <- read.table(text =
"Field1 Field2
100 200
150 180
200 160
280 250
300 300
300 250",
header = TRUE)
Now, what I want to do is create a function that will apply a rolling % addition, something like:
fn <- function(tbl, pct) {}
which accepts the dataframe above as tbl
. It adds a percentage fraction of the current row to the NEXT row down based on pct
, and rolls this almost in a cumulative fashion.
For example, fn(tbl$Field1, 0.1)
would generate the following results:
100 (100 + 0.1*0)
160 (150 + 0.1*100 = 160)
216 (200 + 0.1*160 = 216)
301.6 (280 + 0.1*216 = 301.6)
etc.
I'd use a package solution, but would prefer base R as it helps with the learning process! My longer term goal is to build a process the loops through each combination of field and pct so I can test it's effect in a regression model; hence my gut feel is that a function I can later apply is the way forward.
Thanks.
The filter()
function is part of the stats
package, which is base R. Keeping to one decimal place:
round(filter(tbl$Field1, 0.1, method="recursive"), 1)
Which would produce the following results
100.0 160.0 216.0 301.6 330.2 333.0
You can use the Reduce()
function as in the following.
cumpersum = function(x, percent = 0.1) {
Reduce(function(x1, x2) percent * x1 + x2, x, accumulate = TRUE)
}
dat <- data.frame(
Field1 = c(100, 150, 200, 280, 300, 300),
Field2 = c(200, 180, 160, 250, 300, 250)
)
dat$Field1cumper <- cumpersum(dat$Field1, .1)
dat
# Field1 Field2 Field1cumper
# 1 100 200 100.0
# 2 150 180 160.0
# 3 200 160 216.0
# 4 280 250 301.6
# 5 300 300 330.2
# 6 300 250 333.0
If you want to write a solution with just base R and learning programming from the very basics using a for
loop and indexes, you could just know that you can write a function whose corpus look like the following:
solution= tbl$Field1
for (i in 1:length(tbl$Field1)) {
if (i==1) {
solution[1] = tbl$Field1[1]
} else {
solution[i] = tbl$Field1[i] + pct * solution[i-1]
}
}
though I would recommend to take a look to more advanced solutions. The lag
function already mentioned could be handy.
It's tempting to figure out a solution that doesn't involve explicit looping, but I couldn't think of one. You can decompose the desired result into a sum of numbers multiplied by pct^c(0, 1, 2, ...)
but I think that just makes you do a lot of extra calculation. So my solution would be simply:
fn = function(x, pct) {
n = length(x)
result = NA*x
last_result = 0
for(i in 1:n) {
result[i] = last_result = x[i] + last_result*pct
}
return(result)
}
fn(tbl$Field1, 0.1)
# [1] 100.000 160.000 216.000 301.600 330.160 333.016
来源:https://stackoverflow.com/questions/49112929/rolling-percentage-add-along-column