lm | 易学教程

Function which runs lm over different variables

阅读更多关于 Function which runs lm over different variables

问题 I would like to create a function which can run a regression model (e.g. using lm) over different variables in a given dataset. In this function, I would specify as arguments the dataset I'm using, the dependent variable y and the independent variable x. I want this to be a function and not a loop as I would like to call the code in various places of my script. My naive function would look something like this: lmfun <- function(data, y, x) { lm(y ~ x, data = data) } This function obviously

Difference between categorical variables (factors) and dummy variables

阅读更多关于 Difference between categorical variables (factors) and dummy variables

问题 I was running a regression using categorical variables and came across this question. Here, the user wanted to add a column for each dummy. This left me quite confused because I though having long data with the column including all the dummies stored using as.factor() was equivalent to having dummy variables. Could someone explain the difference between the following two linear regression models? Linear Model 1, where Month is a factor: dt_long Sales Period Month 1: 0.4898943 1 M1 2: 0

Updating a linear regression model with update and purrr

阅读更多关于 Updating a linear regression model with update and purrr

问题 I want to update a lm -model using the update -function inside a map -call, but this throws the following error: mtcars %>% group_by(cyl) %>% nest() %>% mutate(lm1 = map(data, ~lm(mpg ~ wt, data = .x)), lm2 = map(lm1, ~update(object = .x, formula = .~ . + hp))) Error in mutate_impl(.data, dots) : Evaluation error: cannot coerce class ""lm"" to a data.frame. Can anyone help me with this problem? I am confused about this error, because e.g. this works totally fine: mtcars %>% group_by(cyl) %>%

Linear regression with product of factor and independent variable

阅读更多关于 Linear regression with product of factor and independent variable

问题 I am try to estimate a demand model: d_t^k = a_t - b^k p_t^k + e_t^k The indices t are for week number, k are for product number. The demand for each product d_t^k depends on the general seasonality that is shared by all the products a_t , and is a affine function of the price of the product in that week p_t^k , plus some normal random error e_t^k . However, if I use the following lm function call, it gives me a single coefficient b for price , when what I want is one coefficient per product

Use of offset in lm regression - R

阅读更多关于 Use of offset in lm regression - R

问题 I've this programme dens <- read.table('DensPiu.csv', header = FALSE) fl <- read.table('FluxPiu.csv', header = FALSE) mydata <- data.frame(c(dens),c(fl)) dat = subset(mydata, dens>=3.15) colnames(dat) <- c("x", "y") attach(dat) and I want to do a least-square regression on the data contained in dat , the function has the form y ~ a + b*x and I want the regression line to pass through a specific point P(x0,y0) (which is not the origin). I'm trying to do it like this x0 <- 3.15 y0 <-283.56

r functions calling lm with subsets

阅读更多关于 r functions calling lm with subsets

问题 I was working on some code and I noticed something peculiar. When I run LM on a subset of some panel data I have it works fine, something like this: library('plm') data(Cigar) lm(log(price) ~ log(pop) + log(ndi), data=Cigar, subset=Cigar$state==1) Call: lm(formula = log(price) ~ log(pop) + log(ndi), data = Cigar, subset = Cigar$state == 1) Coefficients: (Intercept) log(pop) log(ndi) -26.4919 3.2749 0.4265 but when I try to wrap this in a function I get: myfunction <- function(formula, data,

r functions calling lm with subsets

阅读更多关于 r functions calling lm with subsets

Predict.lm in R fails to recognize newdata

阅读更多关于 Predict.lm in R fails to recognize newdata

问题 I'm running a linear regression where the predictor is categorized by another value and am having trouble generating modeled responses for newdata. First, I generate some random values for the predictor and the error terms. I then construct the response. Note that the predictor's coefficient depends on the value of a categorical variable. I compose a design matrix based on the predictor and its category. set.seed(1) category = c(rep("red", 5), rep("blue",5)) x1 = rnorm(10, mean = 1, sd = 1)

predict() and newdata - How does this work?

阅读更多关于 predict() and newdata - How does this work?

问题 Someone recently posted a question on this paper here: https://static.googleusercontent.com/media/www.google.com/en//googleblogs/pdfs/google_predicting_the_present.pdf The R code of the paper can be found at the very end of the paper. Essentially, the paper investigates one-month ahead predictions of sales through search queries. I think I understood the model and method, but there's one detail that puzzles me. It's the part: 1 ##### Divide data by two parts - model fitting & prediction dat1

Plotting a line of best fit from where data starts to where data ends in R

阅读更多关于 Plotting a line of best fit from where data starts to where data ends in R

问题 I am trying to plot a line of best fit on my dataset in R: abline(lm(y~x)) However the line goes all the way through the entire graph. Is there anyway that I can configure the line so that it only covers the area where the data points are (similar to what you get in Excel)? Many thanks! 回答1: A solution would be to use lines() and have two predictions for both extremes of x . See this example: x <- rnorm(20) y <- 5 + 0.4*x + rnorm(20)/10 dt <- data.frame(x=x, y=y) ols1 <- lm(y ~ x, data=dt) nd