问题
I want to use aggregate
with this custom function:
#linear regression f-n
CalculateLinRegrDiff = function (sample){
fit <- lm(value~ date, data = sample)
diff(range(fit$fitted))
}
dataset2 = aggregate(value ~ id + col, dataset, CalculateLinRegrDiff(dataset))
I receive the error:
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'FUN' of mode 'function' was not found
What is wrong?
回答1:
Your syntax on using aggregate
is wrong in the first place. Pass function CalculateLinRegrDiff
not an evaluated one CalculateLinRegrDiff(dataset)
to FUN
argument.
Secondly, you've chosen the wrong tool. aggregate
can't help you fit a regression by group. It splits the vector on the LHS of ~
according to combinations on the RHS, and then apply FUN
on the LHS. That is, FUN
should be a function that works with an atomic vector not a data frame. Say, mean
, sd
, quantile
, etc are all functions that take atomic vector as input. CalculateLinRegrDiff
expects a data frame input and that is not going to work with aggregate
.
Note that sometimes we use cbind
on the LHS, like cbind(x, y) ~ f
. This means that we apply FUN
in parallel to x ~ f
and y ~ f
. The LHS variables are independent and not used together.
The right tool for you is the by
function. It splits a data frame into sub data frames and applies FUN
on each sub frame. So it is ideal for regression by group.
by(dataset[c("value", "date")], dataset[c("id", "col")], CalculateLinRegrDiff)
A simple reproducible example:
set.seed(0)
dataset <- data.frame(value = runif(20), date = runif(20),
f = sample(gl(2, 10)), g = sample(gl(4, 5)))
oo <- by(dataset[c("value", "date")], dataset[c("f", "g")], CalculateLinRegrDiff)
str(oo)
# by [1:2, 1:4] 0.307 0.251 0.109 0.201 0.472 ...
# - attr(*, "dimnames")=List of 2
# ..$ f: chr [1:2] "1" "2"
# ..$ g: chr [1:4] "1" "2" "3" "4"
Since CalculateLinRegrDiff
is a scalar function that returns a single scalar, by
will simplify the result oo
to an array rather than a list. This array is like a contingency table, so we can use the "table" method of as.data.frame
to reshape it to a data frame:
oo <- as.data.frame.table(oo)
# f g Freq
#1 1 1 0.3069877
#2 2 1 0.2508591
#3 1 2 0.1087895
#4 2 2 0.2007295
#5 1 3 0.4715680
#6 2 3 0.4942069
#7 1 4 0.3223174
#8 2 4 0.4687340
The name "Freq" may be undesired but you can easily change it. Say names(oo)[3] <- "foo"
.
As I said in my comments under your question, we can also use split
and lapply
. But then there is no trivial way to convert the result into a good-looking data frame.
datlist <- split(dataset[c("value", "date")], dataset[c("f", "g")], drop = TRUE)
rr <- lapply(datlist, CalculateLinRegrDiff)
stack(rr)
# values ind
#1 0.3069877 1.1
#2 0.2508591 2.1
#3 0.1087895 1.2
#4 0.2007295 2.2
#5 0.4715680 1.3
#6 0.4942069 2.3
#7 0.3223174 1.4
#8 0.4687340 2.4
I suggest you read Linear Regression and group by in R for a thorough demonstrations on regression by group.
来源:https://stackoverflow.com/questions/51857177/cant-get-aggregate-work-for-regression-by-group