lm | 易学教程

Why predicted polynomial changes drastically when only the resolution of prediction grid changes?

阅读更多关于 Why predicted polynomial changes drastically when only the resolution of prediction grid changes?

Why I have the exact same model, but run predictions on different grid size (by 0.001 vs by 0.01) getting different predictions? set.seed(0) n_data=2000 x=runif(n_data)-0.5 y=0.1*sin(x*30)/x+runif(n_data) plot(x,y) poly_df=5 x_exp=as.data.frame(cbind(y,poly(x, poly_df))) fit=lm(y~.,data=x_exp) x_plt1=seq(-1,1,0.001) x_plt_exp1=as.data.frame(poly(x_plt1,poly_df)) lines(x_plt1,predict(fit,x_plt_exp1),lwd=3,col=2) x_plt2=seq(-1,1,0.01) x_plt_exp2=as.data.frame(poly(x_plt2,poly_df)) lines(x_plt2,predict(fit,x_plt_exp2),lwd=3,col=3) 李哲源 This is a coding / programming problem as on my quick run I

Multi-way interaction: easy way to get numerical coefficient estimates?

阅读更多关于 Multi-way interaction: easy way to get numerical coefficient estimates?

问题 Let's say there's a 4-way interaction, with a 2x2x2 factorial design plus a continuous variable. Factors have the default contrast coding ( contr.treatment ). Here's an example: set.seed(1) cat1 <- as.factor(sample(letters[1:2], 1000, replace = TRUE)) cat2 <- as.factor(sample(letters[3:4], 1000, replace = TRUE)) cat3 <- as.factor(sample(letters[5:6], 1000, replace = TRUE)) cont1 <- rnorm(1000) resp <- rnorm(1000) df <- data.frame(cat1, cat2, cat3, cont1, resp) mod <- lm(resp ~ cont1 * cat1 *

linear regression in R without copying data in memory?

阅读更多关于 linear regression in R without copying data in memory?

The standard way of doing a linear regression is something like this: l <- lm(Sepal.Width ~ Petal.Length + Petal.Width, data=iris) and then use predict(l, new_data) to make predictions, where new_data is a dataframe with columns matching the formula. But lm() returns an lm object, which is a list that contains crap-loads of stuff that is mostly irrelevant in most situations. This includes a copy of the original data, and a bunch of named vectors and arrays the length/size of the data: R> str(l) List of 12 $ coefficients : Named num [1:3] 3.587 -0.257 0.364 ..- attr(*, "names")= chr [1:3] "

lm() called within mutate()

阅读更多关于 lm() called within mutate()

I wonder if it is possible to use lm() within mutate() of dplyr package. Currently I have a dataframe of "date", "company", "return" and "market.ret" reproducible as below: library(dplyr) n.dates <- 60 n.stocks <- 2 date <- seq(as.Date("2011-07-01"), by=1, len=n.dates) symbol <- replicate(n.stocks, paste0(sample(LETTERS, 5), collapse = "")) x <- expand.grid(date, symbol) x$return <- rnorm(n.dates*n.stocks, 0, sd = 0.05) names(x) <- c("date", "company", "return") x <- group_by(x, date) x <- mutate(x, market.ret = mean(x$return, na.rm = TRUE)) Now for each company I would like to fit "return" by

Add regression line (and goodness-of-fit stats) to scatterplot

阅读更多关于 Add regression line (and goodness-of-fit stats) to scatterplot

After reviewing other stackoverflow posts, I am attempting to add a regression line to my scatter plot with: plot(subdata2$PeakToGone, subdata2$NO3_AVG, xlim = c(0, 70)) abline(lm(PeakToGone~NO3_AVG, data = subdata2)) However, it is not showing the line. I would also like to add the R^2, RMSE, and p-value from lm as text on the plot. How can I add the regression line to the plot, along with these goodness-of-fit stats? lukeA By default, plot regards the 1st param as x and the 2nd as y . Try plot(y = subdata2$PeakToGone, x = subdata2$NO3_AVG, xlim = c(0, 70)) abline(lm(PeakToGone~NO3_AVG, data

predict.glm() with three new categories in the test data (r)(error)

阅读更多关于 predict.glm() with three new categories in the test data (r)(error)

I have a data set called data which has 481 092 rows. I split data into two equal halves: The first halve (row 1: 240 546) is called train and was used for the glm() ; the second halve (row 240 547 : 481 092) is called test and should be used to validate the model; Then I started the regression: testreg <- glm(train$returnShipment ~ train$size + train$color + train$price + train$manufacturerID + train$salutation + train$state + train$age + train$deliverytime, family=binomial(link="logit"), data=train) Now the prediction: prediction <- predict.glm(testreg, newdata=test, type="response") gives

predict.glm() with three new categories in the test data (r)(error)

阅读更多关于 predict.glm() with three new categories in the test data (r)(error)

问题 I have a data set called data which has 481 092 rows. I split data into two equal halves: The first halve (row 1: 240 546) is called train and was used for the glm() ; the second halve (row 240 547 : 481 092) is called test and should be used to validate the model; Then I started the regression: testreg <- glm(train$returnShipment ~ train$size + train$color + train$price + train$manufacturerID + train$salutation + train$state + train$age + train$deliverytime, family=binomial(link="logit"),

Rolling regression and prediction with lm() and predict()

阅读更多关于 Rolling regression and prediction with lm() and predict()

I need to apply lm() to an enlarging subset of my dataframe dat , while making prediction for the next observation. For example, I am doing: fit model predict ---------- ------- dat[1:3, ] dat[4, ] dat[1:4, ] dat[5, ] . . . . dat[-1, ] dat[nrow(dat), ] I know what I should do for a particular subset (related to this question: predict() and newdata - How does this work? ). For example to predict the last row, I do dat1 = dat[1:(nrow(dat)-1), ] dat2 = dat[nrow(dat), ] fit = lm(log(clicks) ~ log(v1) + log(v12), data=dat1) predict.fit = predict(fit, newdata=dat2, se.fit=TRUE) How can I do this

Column wise granger's causal tests in R

阅读更多关于 Column wise granger's causal tests in R

问题 I have 2 matrices of different parameters: M1and M3 with the same dimensions. I'll like to do a column wise grangertest in R. M1<- matrix( c(2,3, 1, 4, 3, 3, 1,1, 5, 7), nrow=5, ncol=2) M3<- matrix( c(1, 3, 1,5, 7,3, 1, 3, 3, 4), nrow=5, ncol=2) I'll want to do a granger's causality test to determine if M2 granger causes M1. My actual Matrices contain more columns and rows but this is just an example. The original code between two vectors is below: library(lmtest) data(ChickEgg) grangertest

How to run linear model in R with certain data range?

阅读更多关于 How to run linear model in R with certain data range?

问题 I run a linear model on my dataset which has the dimension of 2 columns and 100 rows. How could I run the model for a certain data range e.g from row 30 to row 80? set.seed(123) # allow reproducible random numbers A <- data.frame(x=rnorm(100), y=runif(100))# 2 columns with 100 rows of data fit.lm <- lm(A$x~A$y) #fit 100 data summary(fit.lm)# summary 100 data Thanks in advance. 回答1: For example , lm(x~y,data = A[30:80,]) Or using subset parameter: lm(x~y,data=A,subset=30:80) 来源： https:/