lm | 易学教程

Linear regression in R with if statement [duplicate]

阅读更多关于 Linear regression in R with if statement [duplicate]

问题 This question already has an answer here : How to run linear model in R with certain data range? (1 answer) Closed 3 years ago . I have a dummy variable black where black==0 is White and black==1 is Black. I am trying to fit a linear model lm for the black==1 category only, however running the code below gives me the incorrect coefficients. Is there a way in R to run a model with the if statement, similar to Stata? library(foreign) df<-read.dta("hw4.dta") attach(df) black[black==0]<-NA model3

How `poly()` generates orthogonal polynomials? How to understand the “coefs” returned?

阅读更多关于 How `poly()` generates orthogonal polynomials? How to understand the “coefs” returned?

My understanding of orthogonal polynomials is that they take the form y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... up to the number of terms desired where a1 , a2 etc are coefficients to each orthogonal term (vary between fits), and c1 , c2 etc are coefficients within the orthogonal terms, determined such that the terms maintain orthogonality (consistent between fits using the same x values) I understand poly() is used to fit orthogonal polynomials. An example x = c(1.160, 1.143, 1.126, 1.109, 1.079, 1.053, 1.040, 1.027, 1.015, 1.004, 0.994, 0.985, 0.977) #

linear model with `lm`: how to get prediction variance of sum of predicted values

阅读更多关于 linear model with `lm`: how to get prediction variance of sum of predicted values

问题 I'm summing the predicted values from a linear model with multiple predictors, as in the example below, and want to calculate the combined variance, standard error and possibly confidence intervals for this sum. lm.tree <- lm(Volume ~ poly(Girth,2), data = trees) Suppose I have a set of Girths : newdat <- list(Girth = c(10,12,14,16) for which I want to predict the total Volume : pr <- predict(lm.tree, newdat, se.fit = TRUE) total <- sum(pr$fit) # [1] 111.512 How can I obtain the variance for

How to minimize size of object of class “lm” without compromising it being passed to predict()

阅读更多关于 How to minimize size of object of class “lm” without compromising it being passed to predict()

问题 I want to run lm() on a large dataset with 50M+ observations with 2 predictors. The analysis is run on a remote server with only 10GB for storing the data. I have tested ´lm()´ on 10K observations sampled from the data and the resulting object had size 2GB+. I need the object of class "lm" returned from lm() ONLY to produce the summary statistics of the model ( summary(lm_object) ) and to make predictions ( predict(lm_object) ). I have done some experiment with the options model, x, y, qr of

Fast linear regression by group

阅读更多关于 Fast linear regression by group

问题 I have 500K users and I need to compute a linear regression (with intercept) for each of them. Each user has around 30 records. I tried with dplyr and lm and this is way too slow. Around 2 sec by user. df%>% group_by(user_id, add = FALSE) %>% do(lm = lm(Y ~ x, data = .)) %>% mutate(lm_b0 = summary(lm)$coeff[1], lm_b1 = summary(lm)$coeff[2]) %>% select(user_id, lm_b0, lm_b1) %>% ungroup() ) I tried to use lm.fit which is known to be faster but it doesn't seem to be compatible with dplyr . Is

Set one or more of coefficients to a specific integer

阅读更多关于 Set one or more of coefficients to a specific integer

I am using a standard lm model and would like to set the coefficients of one or more of my variables to a specific integer. For example, I would like the coefficient of my weather and price variables to be 647 and 15 respectively. I am using the lm function with a standard formula. The closest things I've found so far are the offset function within glm, or restrict.rhs within systemfit. I've also looked at subtracting the total contribution from these variables with their coefficients set, but this is not very scalable. I'm aware of all the issues setting a coefficient has, but would like to

Print R-squared for all of the models fit with lmList

阅读更多关于 Print R-squared for all of the models fit with lmList

问题 I used lmList to fit 480 relationships and I would like the R2 of each of these. Here is an example dataset and model which are pretty close to what it really looks like, except I have 480 eu (experimental units): eu mass day 11 .02 1 11 .03 2 11 .04 3 11 .06 4 12 .01 1 12 .03 2 12 .04 3 12 .05 4 fit<-lmList(mass ~ day | eu, data=df) Printing fit or summary does not give me the information I want. I am ultimately trying to make a new dataframe that will look like: eu intercept slope R2 11 .01

Why does lm run out of memory while matrix multiplication works fine for coefficients?

阅读更多关于 Why does lm run out of memory while matrix multiplication works fine for coefficients?

问题 I am trying to do fixed effects linear regression with R. My data looks like dte yr id v1 v2 . . . . . . . . . . . . . . . I then decided to simply do this by making yr a factor and use lm : lm(v1 ~ factor(yr) + v2 - 1, data = df) However, this seems to run out of memory. I have 20 levels in my factor and df is 14 million rows which takes about 2GB to store, I am running this on a machine with 22 GB dedicated to this process. I then decided to try things the old fashioned way: create dummy

Create and Call Linear Models from List

阅读更多关于 Create and Call Linear Models from List

问题 So I'm trying to compare different linear models in order to determine if one is better than another. However I have several models, so I want to create an list of models and then call on them. Is that possible? Models <- list(lm(y~a),lm(y~b),lm(y~c) Models2 <- list(lm(y~a+b),lm(y~a+c),lm(y~b+c)) anova(Models2[1],Models[1]) Thank you for your help! 回答1: If you have two lists of models, and you want to compare each pair of models, then you want Map : models1 <- list(lm(y ~ a), lm(y ~ b), lm(y

Linear Regression and storing results in data frame [duplicate]

阅读更多关于 Linear Regression and storing results in data frame [duplicate]

问题 This question already has an answer here: Linear Regression and group by in R 10 answers I am running a linear regression on some variables in a data frame. I'd like to be able to subset the linear regressions by a categorical variable, run the linear regression for each categorical variable, and then store the t-stats in a data frame. I'd like to do this without a loop if possible. Here's a sample of what I'm trying to do: a<- c("a","a","a","a","a", "b","b","b","b","b", "c","c","c","c","c")