I'm using nlsLM()
to make a model of a power function, but I need to summarize my data within the function call to find the appropriate coefficient and exponent. More specifically, here is what my model code looks like:
Jmod = nlsLM(value~(a)*summarise(funs(mean), (MW)^b),
start = list(a=100000, b = 1/3), data = mod_data,
upper = c(Inf,1), lower = c(0,1/5))
where MW
is the data I am trying to use to predict value
. The data for MW
is already grouped by month based off a variable called datetime
, so I would like to take the monthly average of MW^b
where b
is found by the nlsLM()
statement.
I would take the average beforehand, but as you may realize, this is not mathematically equivalent [ie. ((a+c)/2)^b is not equal to (a^b + c^b)/2].
If anyone has any info on how to do this, I would be greatly appreciative!
EDIT:
Below is the code to make a sample data set of what I'm trying to work with:
structure(list(datetime = structure(c(1514782800, 1480568400,1504242000, 1509512400, 1509512400, 1485925200, 1517461200, 1485925200, 1501563600, 1467349200, 1472706000, 1454302800, 1483246800, 1498885200, 1506834000, 1477976400, 1483246800, 1477976400, 1509512400, 1496293200, 1451624400, 1454302800, 1454302800, 1464757200, 1498885200, 1517461200, 1462078800, 1506834000, 1522558800, 1483246800, 1501563600, 1451624400, 1485925200, 1501563600, 1451624400, 1517461200, 1475298000, 1480568400, 1512104400, 1456808400, 1477976400, 1475298000, 1517461200, 1459486800, 1501563600, 1477976400, 1506834000, 1506834000, 1451624400, 1483246800), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(2863.27837518519, 2878.40382333333, 1236.74444444444, 3522.48888888889, 3522.48888888889, 2033.55555555556, 3305.5, 2033.55555555556, 2094.7037037037, 3052.91875740741, 2960.52222222222, 1733.7918262963, 2850.28673851852, 2841.40740740741, 3310.77538814815, 2266.26172851852, 2850.28673851852, 2266.26172851852, 3522.48888888889, 2802.55555555556, 2196.82556740741, 1733.7918262963, 1733.7918262963, 3001.43703703704, 2841.40740740741, 3305.5, 2061.4826762963, 3310.77538814815, 3107.01851851852, 2850.28673851852, 2094.7037037037, 2196.82556740741, 2033.55555555556, 2094.7037037037, 2196.82556740741, 3305.5, 2848.90322592593, 2878.40382333333, 2873.73476703704, 2208.64755074074, 2266.2172851852, 2848.90322592593, 3305.5, 2021.68765444444, 2094.7037037037, 2266.26172851852, 3310.77538814815, 3310.77538814815, 2196.82556740741, 2850.28673851852), mon = structure(c(2018, 2016.91666666667, 2017.66666666667, 2017.83333333333, 2017.83333333333, 2017.08333333333, 2018.08333333333, 2017.08333333333, 2017.58333333333, 2016.5, 2016.66666666667, 2016.08333333333, 2017, 2017.5, 2017.75, 2016.83333333333, 2017, 2016.83333333333, 2017.83333333333, 2017.41666666667, 2016, 2016.08333333333, 2016.08333333333, 2016.41666666667, 2017.5, 2018.08333333333, 2016.33333333333, 2017.75, 2018.25, 2017, 2017.58333333333, 2016, 2017.08333333333, 2017.58333333333, 2016, 2018.08333333333, 2016.75, 2016.91666666667, 2017.91666666667, 2016.16666666667, 2016.83333333333, 2016.75, 2018.08333333333, 2016.25,2017.58333333333,2016.83333333333, 2017.75, 2017.75, 2016, 2017), class = "yearmon"),
MW = c(2.6142700774997, 4.02670249993547, 0.666666666666667,
0.724114015571947, 4.07197668868287, 3.74122386862433, 3.30097429092907,
3.84858110028323, 0.666666666666667, 0.666666666666667, 4.35000446878457,
0.666666666666667, 0.666666666666667, 3.8371824280444, 0.825077317374,
0.666666666666667, 4.028058457579, 0.666666666666667, 4.3378032532779,
3.84270845997837, 1.40955788986009, 0.666666666666667, 0.666666666666667,
4.05845600900597, 4.00664052392117, 4.0295346724872, 0.666666666666667,
4.14159923664523, 4.231951299842, 3.9562222817766, 0.666666666666667,
3.61602795165213, 0.666666666666667, 3.58079262746603, 4.12197770915903,
4.2610646492437, 4.02152528469467, 1.0117763092792, 2.03648922832252,
0.666666666666667, 0.666666666666667, 3.8042476910097, 3.91787334748133,
0.666666666666667, 0.666666666666667, 0.89571472289964, 4.1530002677697,
3.93733212731873, 0.710671314318797, 0.666666666666667)), .Names = c("datetime", "value", "mon", "MW"), row.names = c(39113L, 12946L, 4365L, 37505L, 36601L, 31055L, 39814L, 31433L, 32105L, 20668L, 18191L, 8328L, 10232L, 25689L, 35528L, 4577L, 10302L, 5146L, 37975L, 29670L, 28429L, 7932L, 8468L, 23120L, 25111L, 39699L, 24312L, 36246L, 1556L, 11068L, 33269L, 29163L, 31685L, 32419L, 29059L, 40618L, 16751L, 11737L, 34371L, 6001L, 4864L, 16413L, 40304L, 8716L, 33190L, 5399L, 35610L, 36462L, 28338L, 10371L), class = "data.frame")
This will create the mod_data
that I am using to make the model. And just to reiterate, what I have done here is group data by month, found in the mon
column of my data, and now I want to summarize the data by month, but with the exponent included in the mean, as seen in my above code. Thanks again!
What you want isn't clear from the question. Do you want to fit a separate model for every unique month in your data? Or do you want to fit one model for all of the data and then take monthly averages of the value of MW^b
?
Here's one way on how to do the latter case.
require(minpack.lm)
require(tidyverse)
require(broom)
dat <- structure(...) # provided in the question
predictions <-
dat %>%
ungroup %>%
mutate(row = row_number()) %>%
do(augment(nlsLM(
formula = value ~ a * MW^b + 0*row,
data = .,
start = list(a = 100000, b=1/3),
upper = c(Inf, 1),
lower = c(0, 1/5)
)
)
)
joined <-
dat %>%
mutate(row = row_number()) %>%
left_join(predictions, by=c('MW', 'value', 'row')) %>%
select(-row)
joined %>%
group_by(mon) %>%
mutate(monthly_avg_prediction = mean(.fitted))
Notes:
- This kind of stuff is much easier with the broom package. This is because broom converts the results of model-finding functions like
lm
,nls
, ornlsLM
etc. into dataframes. So you don't have to memorize or re-lookup the function-specific structure of the model object (e.g.model$params[['estimate']][[1]])
or similar stuff; the model results are already in R-standard dataframe format. - My solution uses ideas from this StackOverflow answer on how to join broom-generated dataframes of predictions with original data. That's why the stuff with
row_number()
andleft_join()
are in there. Otherwise, in the general case,augment
will throw away data from the original dataframe that is not used in the model prediction, and it will not work well if there are repeated values in the data that is used. - The
.fitted
column is generated by broom'saugment
function. It is the model prediction at the indicated datapoint. - The result (that I think you might want) is in the
monthly_avg_prediction
column of thejoined
dataframe. But that represents a single, global model, fit on all the data, and predictions from that model are averaged by month.
来源:https://stackoverflow.com/questions/51752261/r-summarize-inside-an-nlslm-statement