I have a database metrics grouped by day, and I need to forecast the data for the next 3 months. These data have seasonality, (I believe that the seasonality is by days of the week).
I want to use the Holt Winters method using R, I need to create a time series object, which asks for frequency, (That I think is 7). But how can I know if I'm sure? Have a function to identify the best frequency?
I'm using:
FID_TS <- ts(FID_DataSet$Value, frequency=7)
FID_TS_Observed <- HoltWinters(FID_TS)
If I decompose this data with decompose(FID_TS)
, I have:
And this is my first forecast FID_TS_Observed
:
When I look at the history of the last year, they starts low in the first 3 months and increase from month 3 to 11, when they decrease again.
Maybe my daily data, have a daily have a weekly seasonality (frequency=7) and an monthly seasonality (frequency=7x30=210)? I need the last 365 days?
Have any way to put the frequency by day of the week and by month? Another thing, does it make any difference I take the whole last year or just a part of it to use in the Holt-Winters method?
Thanks in advance :)
Usually, the frequency (or seasonality, you seem to be using the words interchangeably in your post) is determined by domain knowledge. For example if I am working in the restaurant business, and I am analyzing an hourly data set of customers, I know that I will have a 24 hour frequency, with spikes during lunch time and dinner time, and another 168 hour frequency (24 * 7) because there will be a weekly pattern to my customers.
If for some reason, you don't have domain knowledge, you can use the ACF and the PACF, as well as Fourrier analysis to finds the best frequencies for your data.
Have any way to put the frequency by day of the week and by month?
With Holt-Winters, no. HW takes only one seasonal component. For multiple seasonal components, you should try TBATS. As Xiaoxi Wu pointe out, FB Prophet can model multiple seasonalities, and Google's BSTS package can as well.
Another thing, does it make any difference I take the whole last year or just a part of it to use in the Holt-Winters method?
Yes it does. I you want to model a seasonality, then you need at least two times the seasonal period to be able to model it (preferably more), otherwise your model has no way of knowing whether a spike is a seasonal variation or just a one time impulse. So for example to model a weekly seasonality, you need at least 14 days of training data (plus whatever you will use for testing, and for a yearly seasonality, you will need at least 730 days of data, etc....
Looks like you have daily data and you would like to forecast for the next three months. The question here is do you need daily forecasting or weekly forecasting or just monthly forecasting? I guess you will probably need daily or weekly forecast. If you need weekly forecast, it might be easier to group the data first by week and then run forecast.
A very good tool to use for daily data is the Facebook's new Prophet package. It will work with dataframe instead of ts project, which makes it so much easier to handle with. And you can quickly get daily (if you have hourly data or so), weekly and monthly seasonality from some build-in function, like plot_components. Here is a quick start tutorial by Facebook. They have API for both Python and R.
Here are some quick code to plot the weekly and monthly seasonality (is there is any) with Prophet.
library(prophet)
library(dplyr)
df <- FID_DataSet %>% rename(ds = date, y = Value)
m <- prophet(df)
future <- make_future_dataframe(m, periods = 365)
forecast <- predict(m, future)
plot(m, forecast) # plot out the forecast
prophet_plot_components(m, forecast) # plot out the components: trend, weekly and yearly seasonality if there is any.
来源:https://stackoverflow.com/questions/49173818/how-to-identify-the-best-frequency-in-a-time-series