lm

Why predicted polynomial changes drastically when only the resolution of prediction grid changes?

 ̄綄美尐妖づ 提交于 2019-12-01 22:24:45
Why I have the exact same model, but run predictions on different grid size (by 0.001 vs by 0.01) getting different predictions? set.seed(0) n_data=2000 x=runif(n_data)-0.5 y=0.1*sin(x*30)/x+runif(n_data) plot(x,y) poly_df=5 x_exp=as.data.frame(cbind(y,poly(x, poly_df))) fit=lm(y~.,data=x_exp) x_plt1=seq(-1,1,0.001) x_plt_exp1=as.data.frame(poly(x_plt1,poly_df)) lines(x_plt1,predict(fit,x_plt_exp1),lwd=3,col=2) x_plt2=seq(-1,1,0.01) x_plt_exp2=as.data.frame(poly(x_plt2,poly_df)) lines(x_plt2,predict(fit,x_plt_exp2),lwd=3,col=3) 李哲源 This is a coding / programming problem as on my quick run I

Multi-way interaction: easy way to get numerical coefficient estimates?

丶灬走出姿态 提交于 2019-12-01 21:12:15
问题 Let's say there's a 4-way interaction, with a 2x2x2 factorial design plus a continuous variable. Factors have the default contrast coding ( contr.treatment ). Here's an example: set.seed(1) cat1 <- as.factor(sample(letters[1:2], 1000, replace = TRUE)) cat2 <- as.factor(sample(letters[3:4], 1000, replace = TRUE)) cat3 <- as.factor(sample(letters[5:6], 1000, replace = TRUE)) cont1 <- rnorm(1000) resp <- rnorm(1000) df <- data.frame(cat1, cat2, cat3, cont1, resp) mod <- lm(resp ~ cont1 * cat1 *

linear regression in R without copying data in memory?

雨燕双飞 提交于 2019-12-01 20:10:25
The standard way of doing a linear regression is something like this: l <- lm(Sepal.Width ~ Petal.Length + Petal.Width, data=iris) and then use predict(l, new_data) to make predictions, where new_data is a dataframe with columns matching the formula. But lm() returns an lm object, which is a list that contains crap-loads of stuff that is mostly irrelevant in most situations. This includes a copy of the original data, and a bunch of named vectors and arrays the length/size of the data: R> str(l) List of 12 $ coefficients : Named num [1:3] 3.587 -0.257 0.364 ..- attr(*, "names")= chr [1:3] "

lm() called within mutate()

末鹿安然 提交于 2019-12-01 18:28:42
I wonder if it is possible to use lm() within mutate() of dplyr package. Currently I have a dataframe of "date", "company", "return" and "market.ret" reproducible as below: library(dplyr) n.dates <- 60 n.stocks <- 2 date <- seq(as.Date("2011-07-01"), by=1, len=n.dates) symbol <- replicate(n.stocks, paste0(sample(LETTERS, 5), collapse = "")) x <- expand.grid(date, symbol) x$return <- rnorm(n.dates*n.stocks, 0, sd = 0.05) names(x) <- c("date", "company", "return") x <- group_by(x, date) x <- mutate(x, market.ret = mean(x$return, na.rm = TRUE)) Now for each company I would like to fit "return" by

Add regression line (and goodness-of-fit stats) to scatterplot

ぃ、小莉子 提交于 2019-12-01 14:41:09
After reviewing other stackoverflow posts, I am attempting to add a regression line to my scatter plot with: plot(subdata2$PeakToGone, subdata2$NO3_AVG, xlim = c(0, 70)) abline(lm(PeakToGone~NO3_AVG, data = subdata2)) However, it is not showing the line. I would also like to add the R^2, RMSE, and p-value from lm as text on the plot. How can I add the regression line to the plot, along with these goodness-of-fit stats? lukeA By default, plot regards the 1st param as x and the 2nd as y . Try plot(y = subdata2$PeakToGone, x = subdata2$NO3_AVG, xlim = c(0, 70)) abline(lm(PeakToGone~NO3_AVG, data

predict.glm() with three new categories in the test data (r)(error)

倾然丶 夕夏残阳落幕 提交于 2019-12-01 14:13:34
I have a data set called data which has 481 092 rows. I split data into two equal halves: The first halve (row 1: 240 546) is called train and was used for the glm() ; the second halve (row 240 547 : 481 092) is called test and should be used to validate the model; Then I started the regression: testreg <- glm(train$returnShipment ~ train$size + train$color + train$price + train$manufacturerID + train$salutation + train$state + train$age + train$deliverytime, family=binomial(link="logit"), data=train) Now the prediction: prediction <- predict.glm(testreg, newdata=test, type="response") gives

predict.glm() with three new categories in the test data (r)(error)

一曲冷凌霜 提交于 2019-12-01 13:21:36
问题 I have a data set called data which has 481 092 rows. I split data into two equal halves: The first halve (row 1: 240 546) is called train and was used for the glm() ; the second halve (row 240 547 : 481 092) is called test and should be used to validate the model; Then I started the regression: testreg <- glm(train$returnShipment ~ train$size + train$color + train$price + train$manufacturerID + train$salutation + train$state + train$age + train$deliverytime, family=binomial(link="logit"),

Rolling regression and prediction with lm() and predict()

好久不见. 提交于 2019-12-01 09:25:38
I need to apply lm() to an enlarging subset of my dataframe dat , while making prediction for the next observation. For example, I am doing: fit model predict ---------- ------- dat[1:3, ] dat[4, ] dat[1:4, ] dat[5, ] . . . . dat[-1, ] dat[nrow(dat), ] I know what I should do for a particular subset (related to this question: predict() and newdata - How does this work? ). For example to predict the last row, I do dat1 = dat[1:(nrow(dat)-1), ] dat2 = dat[nrow(dat), ] fit = lm(log(clicks) ~ log(v1) + log(v12), data=dat1) predict.fit = predict(fit, newdata=dat2, se.fit=TRUE) How can I do this

Column wise granger's causal tests in R

余生长醉 提交于 2019-12-01 06:51:55
问题 I have 2 matrices of different parameters: M1and M3 with the same dimensions. I'll like to do a column wise grangertest in R. M1<- matrix( c(2,3, 1, 4, 3, 3, 1,1, 5, 7), nrow=5, ncol=2) M3<- matrix( c(1, 3, 1,5, 7,3, 1, 3, 3, 4), nrow=5, ncol=2) I'll want to do a granger's causality test to determine if M2 granger causes M1. My actual Matrices contain more columns and rows but this is just an example. The original code between two vectors is below: library(lmtest) data(ChickEgg) grangertest

How to run linear model in R with certain data range?

南楼画角 提交于 2019-12-01 05:57:41
问题 I run a linear model on my dataset which has the dimension of 2 columns and 100 rows. How could I run the model for a certain data range e.g from row 30 to row 80? set.seed(123) # allow reproducible random numbers A <- data.frame(x=rnorm(100), y=runif(100))# 2 columns with 100 rows of data fit.lm <- lm(A$x~A$y) #fit 100 data summary(fit.lm)# summary 100 data Thanks in advance. 回答1: For example , lm(x~y,data = A[30:80,]) Or using subset parameter: lm(x~y,data=A,subset=30:80) 来源: https:/