lm | 易学教程

R: polynomial shortcut notation in nls() formula

阅读更多关于 R: polynomial shortcut notation in nls() formula

With the linear model function lm() polynomial formulas can contain a shortcut notation like this: m <- lm(y ~ poly(x,3)) this is a shortcut that keeps the user from having to create x^2 and x^3 variables or typing them in the formula like I(x^2) + I(x^3) . Is there comparable notation for the nonlinear function nls() ? poly(x, 3) is rather more than just a shortcut for x + I(x ^ 2) + I(x ^ 3) - it actually produces legendre polynomials which have the nice property of being uncorrelated: options(digits = 2) x <- runif(100) var(cbind(x, x ^ 2, x ^ 3)) # x # x 0.074 0.073 0.064 # 0.073 0.077 0

Can't get aggregate() work for regression by group

阅读更多关于 Can't get aggregate() work for regression by group

问题 I want to use aggregate with this custom function: #linear regression f-n CalculateLinRegrDiff = function (sample){ fit <- lm(value~ date, data = sample) diff(range(fit$fitted)) } dataset2 = aggregate(value ~ id + col, dataset, CalculateLinRegrDiff(dataset)) I receive the error: Error in get(as.character(FUN), mode = "function", envir = envir) : object 'FUN' of mode 'function' was not found What is wrong? 回答1: Your syntax on using aggregate is wrong in the first place. Pass function

Plot fitted line within certain range R

阅读更多关于 Plot fitted line within certain range R

问题 Using R, I would like to plot a linear relationship between two variables, but I would like the fitted line to be present only within the range of the data. For example, if I have the following code, I would like the line to exist only from x and y values of 1:10 (with default parameters this line extends beyond the range of data points). x <- 1:10 y <- 1:10 plot(x,y) abline(lm(y~x)) 回答1: Instead of using abline() , (a) save the fitted model, (b) use predict.lm() to find the fitted y-values

regressions with xts in R

阅读更多关于 regressions with xts in R

问题 Is there a utility to run regressions using xts objects of the following type: lm(y ~ lab(x, 1) + lag(x, 2) + lag(x,3), data=as.data.frame(coredata(my_xts))) where my_xts is an xts object that contains an x and a y . The point of the question is is there a way to avoid doing a bunch of lags and merges to have a data.frame with all the lags? I think that the package dyn works for zoo objects so i would expect it to work the same way with xts but though there might be something updated. 回答1:

Performing lm() and segmented() on multiple columns in R

阅读更多关于 Performing lm() and segmented() on multiple columns in R

问题 I am trying to perform lm() and segmented() in R using the same independent variable (x) and multiple dependent response variables (Curve1, Curve2, etc.) one by one. I wish to extract the estimated break point and model coefficients for each response variable. I include an example of my data below. x Curve1 Curve2 Curve3 1 -0.236422 98.8169 95.6828 101.7910 2 -0.198083 98.3260 95.4185 101.5170 3 -0.121406 97.3442 94.8899 100.9690 4 0.875399 84.5815 88.0176 93.8424 5 0.913738 84.1139 87.7533

How does plot.lm() determine outliers for residual vs fitted plot?

阅读更多关于 How does plot.lm() determine outliers for residual vs fitted plot?

How does plot.lm() determine what points are outliers (that is, what points to label) for residual vs fitted plot? The only thing I found in the documentation is this: Details sub.caption—by default the function call—is shown as a subtitle (under the x-axis title) on each plot when plots are on separate pages, or as a subtitle in the outer margin (if any) when there are multiple plots per page. The ‘Scale-Location’ plot, also called ‘Spread-Location’ or ‘S-L’ plot, takes the square root of the absolute residuals in order to diminish skewness (sqrt(|E|)) is much less skewed than | E | for

How to remove a lower order parameter in a model when the higher order parameters remain?

阅读更多关于 How to remove a lower order parameter in a model when the higher order parameters remain?

The problem: I cannot remove a lower order parameter (e.g., a main effects parameter) in a model as long as the higher order parameters (i.e., interactions) remain in the model. Even when doing so, the model is refactored and the new model is not nested in the higher model. See the following example (as I am coming from ANOVAs I use contr.sum ): d <- data.frame(A = rep(c("a1", "a2"), each = 50), B = c("b1", "b2"), value = rnorm(100)) options(contrasts=c('contr.sum','contr.poly')) m1 <- lm(value ~ A * B, data = d) m1 ## Call: ## lm(formula = value ~ A * B, data = d) ## ## Coefficients: ##

Fitting a function in R

阅读更多关于 Fitting a function in R

I have a few datapoints (x and y) that seem to have a logarithmic relationship. > mydata x y 1 0 123 2 2 116 3 4 113 4 15 100 5 48 87 6 75 84 7 122 77 > qplot(x, y, data=mydata, geom="line") Now I would like to find an underlying function that fits the graph and allows me to infer other datapoints (i.e. 3 or 82 ). I read about lm and nls but I'm not getting anywhere really. At first, I created a function of which I thought it resembled the plot the most: f <- function(x, a, b) { a * exp(b *-x) } x <- seq(0:100) y <- f(seq(0:100), 1,1) qplot(x,y, geom="line") Afterwards, I tried to generate a

How to obtain RMSE out of lm result?

阅读更多关于 How to obtain RMSE out of lm result?

问题 I know there is a small difference between $sigma and the concept of root mean squared error . So, i am wondering what is the easiest way to obtain RMSE out of lm function in R ? res<-lm(randomData$price ~randomData$carat+ randomData$cut+randomData$color+ randomData$clarity+randomData$depth+ randomData$table+randomData$x+ randomData$y+randomData$z) length(coefficients(res)) contains 24 coefficient, and I cannot make my model manually anymore. So, how can I evaluate the RMSE based on

Running a stepwise linear model with BIC criterion

阅读更多关于 Running a stepwise linear model with BIC criterion

Is it possible to set a stepwise linear model to use the BIC criteria rather than AIC? I've been trying this but it still calculates each step using AIC values rather than BIC null = lm(data[,1] ~ 1) full = lm(data[,1] ~ age + bmi + gender + group) step(null, scope = list(lower=null,upper=full), direction="both", criterion = "BIC") Add the argument k=log(n) to the step function ( n number of samples in the model matrix) From ?step : Arguments: ... k the multiple of the number of degrees of freedom used for the penalty. Only k = 2 gives the genuine AIC; k = log(n) is sometimes referred to as