lm | 易学教程

How to run lm for each subset of the data frame, and then aggreage the result? [duplicate]

阅读更多关于 How to run lm for each subset of the data frame, and then aggreage the result? [duplicate]

问题 This question already has answers here : Linear Regression and group by in R (10 answers) Closed 3 years ago . I have a big data frame df, with columns named as : age, income, country what I want to do is very simpe actually, do fitFunc<-function(thisCountry){ subframe<-df[which(country==thisCountry)]; fit<-lm(income~0+age, data=subframe); return(coef(fit)); } for each individual country. Then aggregate the result into a new data frame looks like : countryname, coeffname 1 USA 1.2 2 GB 1.0 3

extracting p values from multiple linear regression (lm) inside of a ddply function using spatial data

阅读更多关于 extracting p values from multiple linear regression (lm) inside of a ddply function using spatial data

问题 I have a set of spatial coordinate (x,y) data that has a response variable for each coordinate over the course of several years. The following code generates a similar data frame: df <- data.frame( id = rep(1:2, 2), x = rep(c(25, 30),10), y = rep(c(100, 200), 10), year = rep(1980:1989, 2), response = rnorm(20) ) The resulting data frame: head(df) id x y year response 1 1 25 100 1980 0.1707431 2 2 30 200 1981 1.3562263 3 1 25 100 1982 -0.4590506 4 2 30 200 1983 1.3238410 5 1 25 100 1984 1

What is the red solid line in the “residuals vs leverage” plot produced by `plot.lm()`?

阅读更多关于 What is the red solid line in the “residuals vs leverage” plot produced by `plot.lm()`?

问题 fit <- lm(dist ~ speed, cars) plot(fit, which = 5) What does the solid red line in the middle of plot mean? I think it is not about cook's distance. 回答1: It is the LOESS regression line (with span = 2/3 and degree = 2 ), by smoothing standardised residuals against leverage. Internally in plot.lm() , variable xx is leverage, while rsp is Pearson residuals (i.e., standardised residuals). Then, the scattered plot as well as the red solid line is drawn via: graphics::panel.smooth(xx, rsp) Here is

Use Predict on data.table with Linear Regression

阅读更多关于 Use Predict on data.table with Linear Regression

问题 Regrad to this Post, I have created an example to play with linear regression on data.table package as follows: ## rm(list=ls()) # anti-social library(data.table) set.seed(1011) DT = data.table(group=c("b","b","b","a","a","a"), v1=rnorm(6),v2=rnorm(6), y=rnorm(6)) setkey(DT, group) ans <- DT[,as.list(coef(lm(y~v1+v2))), by = group] return, group (Intercept) v1 v2 1: a 1.374942 -2.151953 -1.355995 2: b -2.292529 3.029726 -9.894993 I am able to obtain the coefficients of the lm function. My

plot.lm(): extracting numbers labelled in the diagnostic Q-Q plot

阅读更多关于 plot.lm(): extracting numbers labelled in the diagnostic Q-Q plot

问题 For the simple example below, you can see that there are certain points that are identified in the ensuing plots. How can I extract the row numbers identified in these plots, especially the Normal Q-Q plot? set.seed(2016) maya <- data.frame(rnorm(100)) names(maya)[1] <- "a" maya$b <- rnorm(100) mara <- lm(b~a, data=maya) plot(mara) I tried using str(mara) to see if I could find a list there, but I can't see any of the numbers from the Normal Q-Q plot there. Thoughts? 回答1: I have edited your

extracting p values from multiple linear regression (lm) inside of a ddply function using spatial data

阅读更多关于 extracting p values from multiple linear regression (lm) inside of a ddply function using spatial data

I have a set of spatial coordinate (x,y) data that has a response variable for each coordinate over the course of several years. The following code generates a similar data frame: df <- data.frame( id = rep(1:2, 2), x = rep(c(25, 30),10), y = rep(c(100, 200), 10), year = rep(1980:1989, 2), response = rnorm(20) ) The resulting data frame: head(df) id x y year response 1 1 25 100 1980 0.1707431 2 2 30 200 1981 1.3562263 3 1 25 100 1982 -0.4590506 4 2 30 200 1983 1.3238410 5 1 25 100 1984 1.7765772 6 2 30 200 1985 -0.6258069 I want to run a linear regression on each cell through time to get the

Differences in Linear Regression in R and Python [closed]

阅读更多关于 Differences in Linear Regression in R and Python [closed]

Closed. This question is off-topic . It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . I was trying to match the linear regression R results with that of python Matching the coefficients for each of independent variable and below is the code: Data is uploaded. https://www.dropbox.com/s/oowe4irm9332s78/X.csv?dl=0 https://www.dropbox.com/s/79scp54unzlbwyk/Y.csv?dl=0 R code: #define pathname = " " X <- read.csv(file.path(pathname,"X.csv"),stringsAsFactors = F) Y <- read.csv(file.path(pathname,"Y.csv"

Get all models from leaps regsubsets

阅读更多关于 Get all models from leaps regsubsets

问题 I used regsubsets to search for models. Is it possible to automatically create all lm from the list of parameter selections? library(leaps) leaps<-regsubsets(y ~ x1 + x2 + x3, data, nbest=1, method="exhaustive") summary(leaps)$which (Intercept) x1 x2 x3 1 TRUE FALSE FALSE TRUE 2 TRUE FALSE TRUE TRUE 3 TRUE TRUE TRUE TRUE Now i would manually do model_1 <- lm(y ~ x3) and so on. How can this be automated to have them in a list? 回答1: I don't know why you want a list of all models. summary and

fixed effects in R: plm vs lm + factor()

阅读更多关于 fixed effects in R: plm vs lm + factor()

问题 I'm trying to run a fixed effects regression model in R. I want to control for heterogeneity in variables C and D (neither are a time variable). I tried the following two approaches: 1) Use the plm package: Gives me the following error message formula = Y ~ A + B + C + D reg = plm(formula, data= data, index=c('C','D'), method = 'within') duplicate couples (time-id)Error in pdim.default(index[[1]], index[[2]]) : I also tried creating first a panel using data_p = pdata.frame(data,index=c('C','D

Applying lm() and predict() to multiple columns in a data frame

阅读更多关于 Applying lm() and predict() to multiple columns in a data frame

问题 I have an example dataset below. train<-data.frame(x1 = c(4,5,6,4,3,5), x2 = c(4,2,4,0,5,4), x3 = c(1,1,1,0,0,1), x4 = c(1,0,1,1,0,0), x5 = c(0,0,0,1,1,1)) Suppose I want to create separate models for column x3 , x4 , x5 based on column x1 and x2 . For example lm1 <- lm(x3 ~ x1 + x2) lm2 <- lm(x4 ~ x1 + x2) lm3 <- lm(x5 ~ x1 + x2) I want to then take these models and apply them to a testing set using predict, and then create a matrix that has each model outcome as a column. test <- data.frame