lm

Linear regression in R with if statement [duplicate]

谁说胖子不能爱 提交于 2019-11-27 08:31:37
问题 This question already has an answer here : How to run linear model in R with certain data range? (1 answer) Closed 3 years ago . I have a dummy variable black where black==0 is White and black==1 is Black. I am trying to fit a linear model lm for the black==1 category only, however running the code below gives me the incorrect coefficients. Is there a way in R to run a model with the if statement, similar to Stata? library(foreign) df<-read.dta("hw4.dta") attach(df) black[black==0]<-NA model3

How `poly()` generates orthogonal polynomials? How to understand the “coefs” returned?

丶灬走出姿态 提交于 2019-11-27 08:28:50
My understanding of orthogonal polynomials is that they take the form y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... up to the number of terms desired where a1 , a2 etc are coefficients to each orthogonal term (vary between fits), and c1 , c2 etc are coefficients within the orthogonal terms, determined such that the terms maintain orthogonality (consistent between fits using the same x values) I understand poly() is used to fit orthogonal polynomials. An example x = c(1.160, 1.143, 1.126, 1.109, 1.079, 1.053, 1.040, 1.027, 1.015, 1.004, 0.994, 0.985, 0.977) #

linear model with `lm`: how to get prediction variance of sum of predicted values

有些话、适合烂在心里 提交于 2019-11-27 06:20:39
问题 I'm summing the predicted values from a linear model with multiple predictors, as in the example below, and want to calculate the combined variance, standard error and possibly confidence intervals for this sum. lm.tree <- lm(Volume ~ poly(Girth,2), data = trees) Suppose I have a set of Girths : newdat <- list(Girth = c(10,12,14,16) for which I want to predict the total Volume : pr <- predict(lm.tree, newdat, se.fit = TRUE) total <- sum(pr$fit) # [1] 111.512 How can I obtain the variance for

How to minimize size of object of class “lm” without compromising it being passed to predict()

主宰稳场 提交于 2019-11-27 06:09:22
问题 I want to run lm() on a large dataset with 50M+ observations with 2 predictors. The analysis is run on a remote server with only 10GB for storing the data. I have tested ´lm()´ on 10K observations sampled from the data and the resulting object had size 2GB+. I need the object of class "lm" returned from lm() ONLY to produce the summary statistics of the model ( summary(lm_object) ) and to make predictions ( predict(lm_object) ). I have done some experiment with the options model, x, y, qr of

Fast linear regression by group

那年仲夏 提交于 2019-11-27 05:37:07
问题 I have 500K users and I need to compute a linear regression (with intercept) for each of them. Each user has around 30 records. I tried with dplyr and lm and this is way too slow. Around 2 sec by user. df%>% group_by(user_id, add = FALSE) %>% do(lm = lm(Y ~ x, data = .)) %>% mutate(lm_b0 = summary(lm)$coeff[1], lm_b1 = summary(lm)$coeff[2]) %>% select(user_id, lm_b0, lm_b1) %>% ungroup() ) I tried to use lm.fit which is known to be faster but it doesn't seem to be compatible with dplyr . Is

Set one or more of coefficients to a specific integer

送分小仙女□ 提交于 2019-11-27 05:25:24
I am using a standard lm model and would like to set the coefficients of one or more of my variables to a specific integer. For example, I would like the coefficient of my weather and price variables to be 647 and 15 respectively. I am using the lm function with a standard formula. The closest things I've found so far are the offset function within glm, or restrict.rhs within systemfit. I've also looked at subtracting the total contribution from these variables with their coefficients set, but this is not very scalable. I'm aware of all the issues setting a coefficient has, but would like to

Print R-squared for all of the models fit with lmList

隐身守侯 提交于 2019-11-27 03:41:50
问题 I used lmList to fit 480 relationships and I would like the R2 of each of these. Here is an example dataset and model which are pretty close to what it really looks like, except I have 480 eu (experimental units): eu mass day 11 .02 1 11 .03 2 11 .04 3 11 .06 4 12 .01 1 12 .03 2 12 .04 3 12 .05 4 fit<-lmList(mass ~ day | eu, data=df) Printing fit or summary does not give me the information I want. I am ultimately trying to make a new dataframe that will look like: eu intercept slope R2 11 .01

Why does lm run out of memory while matrix multiplication works fine for coefficients?

让人想犯罪 __ 提交于 2019-11-27 03:20:28
问题 I am trying to do fixed effects linear regression with R. My data looks like dte yr id v1 v2 . . . . . . . . . . . . . . . I then decided to simply do this by making yr a factor and use lm : lm(v1 ~ factor(yr) + v2 - 1, data = df) However, this seems to run out of memory. I have 20 levels in my factor and df is 14 million rows which takes about 2GB to store, I am running this on a machine with 22 GB dedicated to this process. I then decided to try things the old fashioned way: create dummy

Create and Call Linear Models from List

戏子无情 提交于 2019-11-27 02:59:05
问题 So I'm trying to compare different linear models in order to determine if one is better than another. However I have several models, so I want to create an list of models and then call on them. Is that possible? Models <- list(lm(y~a),lm(y~b),lm(y~c) Models2 <- list(lm(y~a+b),lm(y~a+c),lm(y~b+c)) anova(Models2[1],Models[1]) Thank you for your help! 回答1: If you have two lists of models, and you want to compare each pair of models, then you want Map : models1 <- list(lm(y ~ a), lm(y ~ b), lm(y

Linear Regression and storing results in data frame [duplicate]

不打扰是莪最后的温柔 提交于 2019-11-27 02:29:44
问题 This question already has an answer here: Linear Regression and group by in R 10 answers I am running a linear regression on some variables in a data frame. I'd like to be able to subset the linear regressions by a categorical variable, run the linear regression for each categorical variable, and then store the t-stats in a data frame. I'd like to do this without a loop if possible. Here's a sample of what I'm trying to do: a<- c("a","a","a","a","a", "b","b","b","b","b", "c","c","c","c","c")