lm

Function which runs lm over different variables

六月ゝ 毕业季﹏ 提交于 2020-01-02 05:22:55
问题 I would like to create a function which can run a regression model (e.g. using lm) over different variables in a given dataset. In this function, I would specify as arguments the dataset I'm using, the dependent variable y and the independent variable x. I want this to be a function and not a loop as I would like to call the code in various places of my script. My naive function would look something like this: lmfun <- function(data, y, x) { lm(y ~ x, data = data) } This function obviously

Difference between categorical variables (factors) and dummy variables

徘徊边缘 提交于 2020-01-01 12:00:15
问题 I was running a regression using categorical variables and came across this question. Here, the user wanted to add a column for each dummy. This left me quite confused because I though having long data with the column including all the dummies stored using as.factor() was equivalent to having dummy variables. Could someone explain the difference between the following two linear regression models? Linear Model 1, where Month is a factor: dt_long Sales Period Month 1: 0.4898943 1 M1 2: 0

Updating a linear regression model with update and purrr

江枫思渺然 提交于 2020-01-01 06:30:07
问题 I want to update a lm -model using the update -function inside a map -call, but this throws the following error: mtcars %>% group_by(cyl) %>% nest() %>% mutate(lm1 = map(data, ~lm(mpg ~ wt, data = .x)), lm2 = map(lm1, ~update(object = .x, formula = .~ . + hp))) Error in mutate_impl(.data, dots) : Evaluation error: cannot coerce class ""lm"" to a data.frame. Can anyone help me with this problem? I am confused about this error, because e.g. this works totally fine: mtcars %>% group_by(cyl) %>%

Linear regression with product of factor and independent variable

女生的网名这么多〃 提交于 2019-12-31 03:54:08
问题 I am try to estimate a demand model: d_t^k = a_t - b^k p_t^k + e_t^k The indices t are for week number, k are for product number. The demand for each product d_t^k depends on the general seasonality that is shared by all the products a_t , and is a affine function of the price of the product in that week p_t^k , plus some normal random error e_t^k . However, if I use the following lm function call, it gives me a single coefficient b for price , when what I want is one coefficient per product

Use of offset in lm regression - R

狂风中的少年 提交于 2019-12-30 06:51:13
问题 I've this programme dens <- read.table('DensPiu.csv', header = FALSE) fl <- read.table('FluxPiu.csv', header = FALSE) mydata <- data.frame(c(dens),c(fl)) dat = subset(mydata, dens>=3.15) colnames(dat) <- c("x", "y") attach(dat) and I want to do a least-square regression on the data contained in dat , the function has the form y ~ a + b*x and I want the regression line to pass through a specific point P(x0,y0) (which is not the origin). I'm trying to do it like this x0 <- 3.15 y0 <-283.56

r functions calling lm with subsets

筅森魡賤 提交于 2019-12-29 09:05:50
问题 I was working on some code and I noticed something peculiar. When I run LM on a subset of some panel data I have it works fine, something like this: library('plm') data(Cigar) lm(log(price) ~ log(pop) + log(ndi), data=Cigar, subset=Cigar$state==1) Call: lm(formula = log(price) ~ log(pop) + log(ndi), data = Cigar, subset = Cigar$state == 1) Coefficients: (Intercept) log(pop) log(ndi) -26.4919 3.2749 0.4265 but when I try to wrap this in a function I get: myfunction <- function(formula, data,

r functions calling lm with subsets

不羁的心 提交于 2019-12-29 09:04:20
问题 I was working on some code and I noticed something peculiar. When I run LM on a subset of some panel data I have it works fine, something like this: library('plm') data(Cigar) lm(log(price) ~ log(pop) + log(ndi), data=Cigar, subset=Cigar$state==1) Call: lm(formula = log(price) ~ log(pop) + log(ndi), data = Cigar, subset = Cigar$state == 1) Coefficients: (Intercept) log(pop) log(ndi) -26.4919 3.2749 0.4265 but when I try to wrap this in a function I get: myfunction <- function(formula, data,

Predict.lm in R fails to recognize newdata

故事扮演 提交于 2019-12-29 02:00:54
问题 I'm running a linear regression where the predictor is categorized by another value and am having trouble generating modeled responses for newdata. First, I generate some random values for the predictor and the error terms. I then construct the response. Note that the predictor's coefficient depends on the value of a categorical variable. I compose a design matrix based on the predictor and its category. set.seed(1) category = c(rep("red", 5), rep("blue",5)) x1 = rnorm(10, mean = 1, sd = 1)

predict() and newdata - How does this work?

大城市里の小女人 提交于 2019-12-25 07:59:31
问题 Someone recently posted a question on this paper here: https://static.googleusercontent.com/media/www.google.com/en//googleblogs/pdfs/google_predicting_the_present.pdf The R code of the paper can be found at the very end of the paper. Essentially, the paper investigates one-month ahead predictions of sales through search queries. I think I understood the model and method, but there's one detail that puzzles me. It's the part: 1 ##### Divide data by two parts - model fitting & prediction dat1

Plotting a line of best fit from where data starts to where data ends in R

依然范特西╮ 提交于 2019-12-25 06:38:10
问题 I am trying to plot a line of best fit on my dataset in R: abline(lm(y~x)) However the line goes all the way through the entire graph. Is there anyway that I can configure the line so that it only covers the area where the data points are (similar to what you get in Excel)? Many thanks! 回答1: A solution would be to use lines() and have two predictions for both extremes of x . See this example: x <- rnorm(20) y <- 5 + 0.4*x + rnorm(20)/10 dt <- data.frame(x=x, y=y) ols1 <- lm(y ~ x, data=dt) nd