lm | 易学教程

How do regression models deal with the factor variables?

阅读更多关于 How do regression models deal with the factor variables?

Suppose I have a data with a factor and response variable. My questions: How linear regression and mixed effect models work with the factor variables? If I have a separate model for each level of the factor variable (m3 and m4) , how does that differ with models m1 and m2 ? Which one is the best model/approach? As an example I use Orthodont data in nlme package. library(nlme) data = Orthodont data2 <- subset(data, Sex=="Male") data3 <- subset(data, Sex=="Female") m1 <- lm (distance ~ age + Sex, data = Orthodont) m2 <- lme(distance ~ age , data = Orthodont, random = ~ 1|Sex) m3 <- lm(distance ~

Linear model with categorical variables in R

阅读更多关于 Linear model with categorical variables in R

问题 I am trying to fit a lineal model with some categorical variables model <- lm(price ~ carat+cut+color+clarity) summary(model) The answer is: Call: lm(formula = price ~ carat + cut + color + clarity) Residuals: Min 1Q Median 3Q Max -11495.7 -688.5 -204.1 458.2 9305.3 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3696.818 47.948 -77.100 < 2e-16 *** carat 8843.877 40.885 216.311 < 2e-16 *** cut.L 755.474 68.378 11.049 < 2e-16 *** cut.Q -349.587 60.432 -5.785 7.74e-09 *** cut.C

Extract model summaries and store them as a new column

阅读更多关于 Extract model summaries and store them as a new column

问题 I'm new to the purrr paradigm and am struggling with it. Following a few sources I have managed to get so far as to nest a data frame, run a linear model on the nested data, extract some coefficients from each lm, and generate a summary for each lm. The last thing I want to do is extract the "r.squared" from the summary (which I would have thought would be the simplest part of what I'm trying to achieve), but for whatever reason I can't get the syntax right. Here's a MWE of what I have that

Piecewise regression with a straight line and a horizontal line joining at a break point

阅读更多关于 Piecewise regression with a straight line and a horizontal line joining at a break point

问题 I want to do a piecewise linear regression with one break point, where the 2nd half of the regression line has slope = 0 . There are examples of how to do a piecewise linear regression, such as here. The problem I'm having is I'm not clear how to fix the slope of half of the model to be 0. I tried lhs <- function(x) ifelse(x < k, k-x, 0) rhs <- function(x) ifelse(x < k, 0, x-k) fit <- lm(y ~ lhs(x) + rhs(x)) where k is the break point, but the segment on the right is not a flat / horizontal

Cluster-Robust Standard Errors in Stargazer

阅读更多关于 Cluster-Robust Standard Errors in Stargazer

问题 Does anyone know how to get stargazer to display clustered SEs for lm models? (And the corresponding F-test?) If possible, I'd like to follow an approach similar to computing heteroskedasticity-robust SEs with sandwich and popping them into stargazer as in http://jakeruss.com/cheatsheets/stargazer.html#robust-standard-errors-replicating-statas-robust-option. I'm using lm to get my regression models, and I'm clustering by firm (a factor variable that I'm not including in the regression models)

Fast group-by simple linear regression

阅读更多关于 Fast group-by simple linear regression

问题 This Q & A arises from How to make group_by and lm fast? where OP was trying to do a simple linear regression per group for a large data frame. In theory, a series of group-by regression y ~ x | g is equivalent to a single pooled regression y ~ x * g . The latter is very appealing because statistical test between different groups is straightforward. But in practice doing this larger regression is not computationally easy. My answer on the linked Q & A reviews packages speedlm and glm4 , but

regression on subsets for unique factor combinations using lm

阅读更多关于 regression on subsets for unique factor combinations using lm

I would like to automate a simple multiple regression for the subsets defined by the unique combinations of the grouping variables. I have a dataframe with several grouping variables df1[,1:6] and some independent variables df1[,8:10] and a response df1[,7]. This is an excerpt from the data. structure(list(Surface = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("NiAu", "Sn"), class = "factor"), Supplier = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), ParticleSize = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),

Function which runs lm over different variables

阅读更多关于 Function which runs lm over different variables

I would like to create a function which can run a regression model (e.g. using lm) over different variables in a given dataset. In this function, I would specify as arguments the dataset I'm using, the dependent variable y and the independent variable x. I want this to be a function and not a loop as I would like to call the code in various places of my script. My naive function would look something like this: lmfun <- function(data, y, x) { lm(y ~ x, data = data) } This function obviously does not work because the lm function does not recognize y and x as variables of the dataset. I have done

Format of R's lm() Formula with a Transformation

阅读更多关于 Format of R's lm() Formula with a Transformation

I can't quite figure out how to do the following in one line: data(attenu) x_temp = attenu$accel^(1/4) y_temp = log(attenu$dist) best_line = lm(y_temp ~ x_temp) Since the above works, I thought I could do the following: data(attenu) best_line = lm( log(attenu$dist) ~ (attenu$accel^(1/4)) ) But this gives the error: Error in terms.formula(formula, data = data) : invalid power in formula There's obviously something I'm missing when using transformed variables in R's formula format. Why doesn't this work? You're looking for the function I so that the ^ operator is treated as arithmetic in the

lm called from inside dlply throws “0 (non-NA) cases” error [r]

阅读更多关于 lm called from inside dlply throws “0 (non-NA) cases” error [r]

I'm using dlply() with a custom function that averages slopes of lm() fits on data that contain some NA values, and I get the error "Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases" This error only happens when I call dlply with two key variables - separating by one variable works fine. Annoyingly I can't reproduce the error with a simple dataset, so I've posted the problem dataset in my dropbox. Here's the code, as minimized as possible while still producing an error: masterData <- read.csv("http://dl.dropbox.com/u/48901983/SOquestionData.csv", na