问题
I have built a linear regression i R. Now I wanna store the model and use it for scoring a new data set once a week.
Someone that can help me with how to?
How to save the model and how to import it and use it on an new dataset.
回答1:
You can save the model in a file and load it when you need it.
For example, you might have a line like this to train your model:
the_model <- glm(my_formula, family=binomial(link='logit'),data=training_set)
This model can be saved with:
save(file="modelfile",the_model) #the name of the file is of course arbitrary
Later, assuming that the file is in the working directory, you can reuse that model by first loading it with
load(file="modelfile")
The model can then be applied to a (new) data set test_set
like, e.g.,
test_set$pred <- predict(the_model, newdata=test_set, type='response')
Note that the name, in this case the_model
should not be assigned to a variable (don't use something like the_model <- load("modelfile")
). The model with its name becomes available with the load()
function. Also, the model remains the same as it was before. The new observations are not changing the coefficients or anything in the model - the "old" model is applied to make predictions on new data.
If, however, you have an additional labeled set and you want to train / improve the model on the basis of these new observations, you can follow the suggestions in the answer by @David.
Hope this helps.
回答2:
You can use the update
function:
set.seed(1)
dat <- data.frame(x = rnorm(100),
y = rnorm(100, 0.01))
lmobj <- lm(y~x, dat)
coef(lmobj)
# (Intercept) x
# -0.027692614 -0.001060386
dat2 <- data.frame(x = rnorm(10),
y = rnorm(10, 0.01))
lmobj2 <- update(lmobj, dat2)
coef(lmobj2)
# (Intercept) y
# 0.1088614395 -0.0009323697
#--------------------------------
# to make things a bit more clear:
# lmobj2 is not the same as a new model such as the following
lmobj3 <- lm(y~x, dat2)
coef(lmobj3)
#(Intercept) x
#-0.02386837 0.06973995
来源:https://stackoverflow.com/questions/33677488/using-an-already-created-model-for-scoring-a-new-data-set-in-r