lm

R data.table loop subset by factor and do lm()

痞子三分冷 提交于 2019-12-01 05:31:33
I am trying to create a function or even just work out how to run a loop using data.table syntax where I can subset the table by factor, in this case the id variable, then run a linear model on each subset and out the results. Sample data below. df <- data.frame(id = letters[1:3], cyl = sample(c("a","b","c"), 30, replace = TRUE), factor = sample(c(TRUE, FALSE), 30, replace = TRUE), hp = sample(c(20:50), 30, replace = TRUE)) dt=as.data.table(df) fit <- lm(hp ~ cyl + factor, data = df) #how do I get the [i] to work here to subset and iterate by each factor and also do it in data.table syntax?

Why do I get NA coefficients and how does `lm` drop reference level for interaction

感情迁移 提交于 2019-12-01 05:26:40
I am trying to understand how R determines reference groups for interactions in a linear model. Consider the following: df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("1", "2", "3", "4", "5"), class = "factor"), year = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("1", "2"), class = "factor"), treatment = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

How does the subset argument work in the lm() function?

社会主义新天地 提交于 2019-12-01 04:18:57
I have been trying to figure out how the subset argument in R's lm() function works. Especially the follwoing code seems dubious for me: data(mtcars) summary(lm(mpg ~ wt, data=mtcars)) summary(lm(mpg ~ wt, cyl, data=mtcars)) In every case the regression has 32 observations dim(lm(mpg ~ wt, cyl ,data=mtcars)$model) [1] 32 2 dim(lm(mpg ~ wt ,data=mtcars)$model) [1] 32 2 yet the coefficients change (along with the R²). The help doesn't provide too much information on this matter: subset an optional vector specifying a subset of observations to be used in the fitting process As a general principle

Plot conditional density curve `P(Y|X)` along a linear regression line

南楼画角 提交于 2019-12-01 03:53:34
This is my data frame, with two columns Y (response) and X (covariate): ## Editor edit: use `dat` not `data` dat <- structure(list(Y = c(NA, -1.793, -0.642, 1.189, -0.823, -1.715, 1.623, 0.964, 0.395, -3.736, -0.47, 2.366, 0.634, -0.701, -1.692, 0.155, 2.502, -2.292, 1.967, -2.326, -1.476, 1.464, 1.45, -0.797, 1.27, 2.515, -0.765, 0.261, 0.423, 1.698, -2.734, 0.743, -2.39, 0.365, 2.981, -1.185, -0.57, 2.638, -1.046, 1.931, 4.583, -1.276, 1.075, 2.893, -1.602, 1.801, 2.405, -5.236, 2.214, 1.295, 1.438, -0.638, 0.716, 1.004, -1.328, -1.759, -1.315, 1.053, 1.958, -2.034, 2.936, -0.078, -0.676, -2

linear regression using lm() - surprised by the result

半城伤御伤魂 提交于 2019-12-01 03:47:16
I used a linear regression on data I have, using the lm function. Everything works (no error message), but I'm somehow surprised by the result: I am under the impression R "misses" a group of points, i.e. the intercept and slope are not the best fit. For instance, I am referring to the group of points at coordinates x=15-25,y=0-20. My questions: is there a function to compare fit with "expected" coefficients and "lm-calculated" coefficients? have I made a silly mistake when coding, leading the lm to do that? Following some answers: additionnal information on x and y x and y are both visual

How does the subset argument work in the lm() function?

好久不见. 提交于 2019-12-01 01:36:47
问题 I have been trying to figure out how the subset argument in R's lm() function works. Especially the follwoing code seems dubious for me: data(mtcars) summary(lm(mpg ~ wt, data=mtcars)) summary(lm(mpg ~ wt, cyl, data=mtcars)) In every case the regression has 32 observations dim(lm(mpg ~ wt, cyl ,data=mtcars)$model) [1] 32 2 dim(lm(mpg ~ wt ,data=mtcars)$model) [1] 32 2 yet the coefficients change (along with the R²). The help doesn't provide too much information on this matter: subset an

Plot conditional density curve `P(Y|X)` along a linear regression line

对着背影说爱祢 提交于 2019-12-01 01:30:55
问题 This is my data frame, with two columns Y (response) and X (covariate): ## Editor edit: use `dat` not `data` dat <- structure(list(Y = c(NA, -1.793, -0.642, 1.189, -0.823, -1.715, 1.623, 0.964, 0.395, -3.736, -0.47, 2.366, 0.634, -0.701, -1.692, 0.155, 2.502, -2.292, 1.967, -2.326, -1.476, 1.464, 1.45, -0.797, 1.27, 2.515, -0.765, 0.261, 0.423, 1.698, -2.734, 0.743, -2.39, 0.365, 2.981, -1.185, -0.57, 2.638, -1.046, 1.931, 4.583, -1.276, 1.075, 2.893, -1.602, 1.801, 2.405, -5.236, 2.214, 1

Repeat the re-sampling function for 1000 times ? Using lapply?

≯℡__Kan透↙ 提交于 2019-12-01 00:17:13
问题 Please me out! I appreciate any helps ! Thanks! I have trouble on repeat doing re-sampling for 1000 times. I tried using replicate() to do that but it's not working. Is there any other method to do that? Can anyone show me if this maybe done by using lapply? Following is my code: #sampling 1000 betas0 & 1 (coefficients) from the data get.beta=function(data,indices){ data=data[indices,] #let boot to select sample lm.out=lm(y ~ x,data=data) return(lm.out$coefficients) } n=nrow(data) get.beta

Use of offset in lm regression - R

痴心易碎 提交于 2019-11-30 22:27:43
I've this programme dens <- read.table('DensPiu.csv', header = FALSE) fl <- read.table('FluxPiu.csv', header = FALSE) mydata <- data.frame(c(dens),c(fl)) dat = subset(mydata, dens>=3.15) colnames(dat) <- c("x", "y") attach(dat) and I want to do a least-square regression on the data contained in dat , the function has the form y ~ a + b*x and I want the regression line to pass through a specific point P(x0,y0) (which is not the origin). I'm trying to do it like this x0 <- 3.15 y0 <-283.56 regression <- lm(y ~ I(x-x0)-1, offset=y0) (I think that data = dat is not necessary in this case) but I

Linear models in R with different combinations of variables

别来无恙 提交于 2019-11-30 21:26:08
I am new to R and I am stuck with a problem. I am trying to read a set of data in a table and I want to perform linear modeling. Below is how I read my data and my variables names: >data =read.table(datafilename,header=TRUE) >names(data) [1] "price" "model" "size" "year" "color" What I want to do is create several linear models using different combinations of the variables (price being the target ), such as: > attach(data) > model1 = lm(price~model+size) > model2 = lm(price~model+year) > model3 = lm(price~model+color) > model4 = lm(price~model+size) > model4 = lm(price~size+year+color) #...