Forecast with ggplot2 and funggcast function

问题

On this website, Mr. Davenport published a function to plot an arima forecast with ggplot2 on the example of an arbitrary dataset, he published here. I can follow his example without any error message.

Now, when I use my data, I would end with the warning:

1: In window.default(x, ...) : 'end' value not changed
2: In window.default(x, ...) : 'end' value not changed

I know that it happens when I call this command pd <- funggcast(yt, yfor) due to an issue with the data I indicate in my data end = c(2013). But I do not know how to fix that.

This is the code I use:

library(ggplot2)
library(zoo)
library(forecast)

myts <- ts(rnorm(55), start = c(1960), end = c(2013), freq = 1)
funggcast <- function(dn, fcast){

en <- max(time(fcast$mean)) # Extract the max date used in the forecast

# Extract Source and Training Data
ds <- as.data.frame(window(dn, end = en))
names(ds) <- 'observed'
ds$date <- as.Date(time(window(dn, end = en)))

# Extract the Fitted Values (need to figure out how to grab confidence intervals)
dfit <- as.data.frame(fcast$fitted)
dfit$date <- as.Date(time(fcast$fitted))
names(dfit)[1] <- 'fitted'

ds <- merge(ds, dfit, all.x = T) # Merge fitted values with source and training data

# Extract the Forecast values and confidence intervals
dfcastn <- as.data.frame(fcast)
dfcastn$date <- as.Date(as.yearmon(row.names(dfcastn)))
names(dfcastn) <- c('forecast','lo80','hi80','lo95','hi95','date')

pd <- merge(ds, dfcastn,all.x = T) # final data.frame for use in ggplot
return(pd)

} 

yt <- window(myts, end = c(2013)) # extract training data until last year
yfit <- auto.arima(myts) # fit arima model
yfor <- forecast(yfit) # forecast
pd <- funggcast(yt, yfor) # extract the data for ggplot using function funggcast()

ggplot(data = pd, aes(x = date,y = observed)) + geom_line(color = "red") + geom_line(aes(y = fitted), color = "blue") + geom_line(aes(y = forecast)) + geom_ribbon(aes(ymin = lo95, ymax = hi95), alpha = .25) + scale_x_date(name = "Time in Decades") + scale_y_continuous(name = "GDP per capita (current US$)") + theme(axis.text.x = element_text(size = 10), legend.justification=c(0,1), legend.position=c(0,1)) + ggtitle("Arima(0,1,1) Fit and Forecast of GDP per capita for Brazil (1960-2013)") + scale_color_manual(values = c("Blue", "Red"), breaks = c("Fitted", "Data", "Forecast"))

Edit: I found another blog here with a function to use with forecast and ggplot2 but I would like to use the approach above, if I were able to find my mistake. Anyone?

Edit2: If I run your updated code with my data here, than I get the graph down below. Note that I did not change the end = c(2023) for mtys, otherwise it would not merge the forecasted with the fitted value.

myts <- ts(WDI_gdp_capita$Brazil, start = c(1960), end = c(2023), freq = 1)

funggcast <- function(dn, fcast){

  en <- max(time(fcast$mean)) # Extract the max date used in the forecast

  # Extract Source and Training Data
  ds <- as.data.frame(window(dn, end = en))
  names(ds) <- 'observed'
  ds$date <- as.Date(time(window(dn, end = en)))

  # Extract the Fitted Values (need to figure out how to grab confidence intervals)
  dfit <- as.data.frame(fcast$fitted)
  dfit$date <- as.Date(time(fcast$fitted))
  names(dfit)[1] <- 'fitted'

  ds <- merge(ds, dfit, all = T) # Merge fitted values with source and training data

  # Extract the Forecast values and confidence intervals
  dfcastn <- as.data.frame(fcast)
  dfcastn$date <- as.Date(paste(row.names(dfcastn),"01","01",sep="-"))
  names(dfcastn) <- c('forecast','lo80','hi80','lo95','hi95','date')

  pd <- merge(ds, dfcastn,all.x = T) # final data.frame for use in ggplot
  return(pd)

} # ggplot function by Frank Davenport

yt <- window(myts, end = c(2013)) # extract training data until last year
yfit <- auto.arima(yt) # fit arima model
yfor <- forecast(yfit) # forecast
pd <- funggcast(myts, yfor) # extract the data for ggplot using function funggcast()

ggplot(data = pd, aes(x = date, y = observed)) + geom_line(color = "red") + geom_line(aes(y = fitted), color = "blue") + geom_line(aes(y = forecast)) + geom_ribbon(aes(ymin = lo95, ymax = hi95), alpha = .25) + scale_x_date(name = "Time in Decades") + scale_y_continuous(name = "GDP per capita (current US$)") + theme(axis.text.x = element_text(size = 10), legend.justification=c(0,1), legend.position=c(0,1)) + ggtitle("Arima(0,1,1) Fit and Forecast of GDP per capita for Brazil (1960-2013)") + scale_color_manual(values = c("Blue", "Red"), breaks = c("Fitted", "Data", "Forecast")) + ggsave((filename = "gdp_forecast_ggplot.pdf"), width=330, height=180, units=c("mm"), dpi = 300, limitsize = TRUE)

The almost perfect graph I get:

One additional question: How can I get a legend in this graph?

If I set end = c(2013) for myts, I get the same graph as in the beginning:

回答1:

There are several points that are different between Mr Davenport's analysis and the plot you are trying to make. The first one is that he is comparing the the arima forecast to some observed data, which is why he trains the model on a portion of the whole time series, the training set. To do this, you should make your initial time series longer:

myts <- ts(rnorm(55), start = c(1960), end = c(2023), freq = 1)

Then at the end of your script, where you select the training up to 2013:

yt <- window(myts, end = c(2013)) # extract training data until last year

The model should be trained on the training set, not the whole time series, so you should change the yfit line to:

yfit <- auto.arima(yt) # fit arima model

And call the funggcast function using the whole time series, because it needs the observed and fitted data:

pd <- funggcast(myts, yfor)

Finally, he uses dates that have month and year, so in his funggcast function, change this line:

dfcastn$date <- as.Date(as.yearmon(row.names(dfcastn)))

To:

dfcastn$date <- as.Date(paste(row.names(dfcastn),"01","01",sep="-"))

This is because the values predicted by the model need to be changed to dates, like 2014 has to be changed to 2014-01-01, in order to be merged with the observed data.

After all the changes, the code looks like this:

library(ggplot2)
library(zoo)
library(forecast)

myts <- ts(rnorm(55), start = c(1960), end = c(2013), freq = 1)
funggcast <- function(dn, fcast){

        en <- max(time(fcast$mean)) # Extract the max date used in the forecast

        # Extract Source and Training Data
        ds <- as.data.frame(window(dn, end = en))
        names(ds) <- 'observed'
        ds$date <- as.Date(time(window(dn, end = en)))

        # Extract the Fitted Values (need to figure out how to grab confidence intervals)
        dfit <- as.data.frame(fcast$fitted)
        dfit$date <- as.Date(time(fcast$fitted))
        names(dfit)[1] <- 'fitted'

        ds <- merge(ds, dfit, all.x = T) # Merge fitted values with source and training data

        # Extract the Forecast values and confidence intervals
        dfcastn <- as.data.frame(fcast)
        dfcastn$date <- as.Date(paste(row.names(dfcastn),"01","01",sep="-"))
        names(dfcastn) <- c('forecast','lo80','hi80','lo95','hi95','date')

        pd <- merge(ds, dfcastn,all= T) # final data.frame for use in ggplot
        return(pd)

} 

yt <- window(myts, end = c(2013)) # extract training data until last year
yfit <- auto.arima(yt) # fit arima model
yfor <- forecast(yfit) # forecast
pd <- funggcast(myts, yfor) # extract the data for ggplot using function funggcast()

plotData<-ggplot(data = pd, aes(x = date, y = observed)) + geom_line(aes(color = "1")) +
        geom_line(aes(y = fitted,color="2")) + 
        geom_line(aes(y = forecast,color="3")) +
        scale_colour_manual(values=c("red", "blue","black"),labels = c("Observed", "Fitted", "Forecasted"),name="Data")+
        geom_ribbon(aes(ymin = lo95, ymax = hi95), alpha = .25)+
        scale_x_date(name = "Time in Decades") +
        scale_y_continuous(name = "GDP per capita (current US$)")+
        theme(axis.text.x = element_text(size = 10)) + 
        ggtitle("Arima(0,1,1) Fit and Forecast of GDP per capita for Brazil (1960-2013)")

plotData

And you get a plot that looks like this, the fitting is pretty bad with a completely random time series. Also ggplot will output some errors because the forecast line has no data before 2013 and the fitted data does not go on after 2013. (I ran it several times, depending on the initial, random time series, the model might just predict 0 everywhere)

Edit: changed the pd assignment line as well, in case there are no observed data after 2013

Edit2: I changed the ggplot function at the end of the code to make sure the legend shows up

回答2:

There is a package called ggfortify available via github which allows straight plotting of forecast objects with ggplot2. It can be found on http://rpubs.com/sinhrks/plot_ts

回答3:

This is a bump on a rather old post, but there's a fuction in github that produces some nice results.

Here's the code as it was on Aug 03, 2016:

function(forec.obj, data.color = 'blue', fit.color = 'red', forec.color = 'black',
                           lower.fill = 'darkgrey', upper.fill = 'grey', format.date = F)
{
    serie.orig = forec.obj$x
    serie.fit = forec.obj$fitted
    pi.strings = paste(forec.obj$level, '%', sep = '')

     if(format.date)
        dates = as.Date(time(serie.orig))
    else
        dates = time(serie.orig)

    serie.df = data.frame(date = dates, serie.orig = serie.orig, serie.fit = serie.fit)

    forec.M = cbind(forec.obj$mean, forec.obj$lower[, 1:2], forec.obj$upper[, 1:2])
    forec.df = as.data.frame(forec.M)
    colnames(forec.df) = c('forec.val', 'l0', 'l1', 'u0', 'u1')

    if(format.date)
        forec.df$date = as.Date(time(forec.obj$mean))
    else
        forec.df$date = time(forec.obj$mean)

    p = ggplot() + 
        geom_line(aes(date, serie.orig, colour = 'data'), data = serie.df) + 
        geom_line(aes(date, serie.fit, colour = 'fit'), data = serie.df) + 
        scale_y_continuous() +
        geom_ribbon(aes(x = date, ymin = l0, ymax = u0, fill = 'lower'), data = forec.df, alpha = I(0.4)) + 
        geom_ribbon(aes(x = date, ymin = l1, ymax = u1, fill = 'upper'), data = forec.df, alpha = I(0.3)) + 
        geom_line(aes(date, forec.val, colour = 'forecast'), data = forec.df) + 
        scale_color_manual('Series', values=c('data' = data.color, 'fit' = fit.color, 'forecast' = forec.color)) + 
        scale_fill_manual('P.I.', values=c('lower' = lower.fill, 'upper' = upper.fill))

    if (format.date)
        p = p + scale_x_date()

    p
}

来源：https://stackoverflow.com/questions/28244929/forecast-with-ggplot2-and-funggcast-function

标签

ggplot2

forecasting