问题
I have a big data set (around 200k rows) where each row is a loan. I have the loan amount, the number of payments, and the loan payment. I'm trying to get the loan rate. R doesn't have a function for calculating this (at least base R doesn't have it, and I couldn't find it). It isn't that hard to write both a npv and irr functions
Npv <- function(i, cf, t=seq(from=0,by=1,along.with=cf)) sum(cf/(1+i)^t)
Irr <- function(cf) { uniroot(npv, c(0,100000), cf=cf)$root }
And you can just do
rate = Irr(c(amt,rep(pmt,times=n)))
The problem is when you try to calculate the rate for a lot of payments. Because uniroot is not vectorized, and because rep takes a surprising amount of time, you end up with a slow calculation. You can make it faster if you do some math and figure out that you are looking for the roots of the following equation
zerome <- function(r) amt/pmt-(1-1/(1+r)^n)/r
and then use that as input for uniroot. This, in my pc, takes around 20 seconds to run for my 200k database.
The problem is that I'm trying to do some optimization, and this is a step of the optimization, so I'm trying to speed it up even more.
I've tried vectorization, but because uniroot is not vectorized, I can't go further that way. Is there any root finding method that is vectorized?
Thanks
回答1:
Instead of using a root finder, you could use a linear interpolator. You will have to create one interpolator for each value of n
(the number of remaining payments). Each interpolator will map (1-1/(1+r)^n)/r
to r
. Of course you will have to build a grid fine enough so it will return r
to an acceptable precision level. The nice thing with this approach is that linear interpolators are fast and vectorized: you can find the rates for all loans with the same number of remaining payments (n
) in a single call to the corresponding interpolator.
Now some code that proves it is a viable solution:
First, we create interpolators, one for each possible value of n
:
n.max <- 360L # 30 years
one.interpolator <- function(n) {
r <- seq(from = 0.0001, to = 0.1500, by = 0.0001)
y <- (1-1/(1+r)^n)/r
approxfun(y, r)
}
interpolators <- lapply(seq_len(n.max), one.interpolator)
Note that I used a precision of 1/100 of a percent (1bp).
Then we create some fake data:
n.loans <- 200000L
n <- sample(n.max, n.loans, replace = TRUE)
amt <- 1000 * sample(100:500, n.loans, replace = TRUE)
pmt <- amt / (n * (1 - runif(n.loans)))
loans <- data.frame(n, amt, pmt)
Finally, we solve for r
:
library(plyr)
system.time(ddply(loans, "n", transform, r = interpolators[[n[1]]](amt / pmt)))
# user system elapsed
# 2.684 0.423 3.084
It's fast. Note that some of the output rates are NA
but it is because my random inputs made no sense and would have returned rates outside of the [0 ~ 15%] grid I selected. Your real data won't have that problem.
来源:https://stackoverflow.com/questions/13868990/fast-loan-rate-calculation-for-a-big-number-of-loans