I have several variables at annual frequency in R that I would like to include in a regression analysis with other variables available at quarterly frequency. Additionally,
A bit late here, but the tempdisagg package does what you want. It ensures that either the sum, the average, the first or the last value of the resulting high frequency series is consistent with the low frequency series.
It also allows you to use external indicator series, e.g., by the Chow-Lin technique. If you don't have it, the Denton-Cholette method produces a better result than the method in Eviews.
Here's your example:
# need ts object as input
z_a <- ts(c(100, 110, 111), start = 2000)
library(tempdisagg)
z_q <- predict(td(z_a ~ 1, method = "denton-cholette", conversion = "average"))
z_q
# Qtr1 Qtr2 Qtr3 Qtr4
# 2000 97.65795 98.59477 100.46841 103.27887
# 2001 107.02614 109.71460 111.34423 111.91503
# 2002 111.42702 111.06100 110.81699 110.69499
# which has the same means as your original series:
tapply(z_q, floor(time(z_q)), mean)
# 2000 2001 2002
# 100 110 111
We could manipulate the output of na.spline
to ensure that it averages to the annual values by shifting the 4 quarters' values or shifting the last 3 quarters' values. In the first case we would subtract the mean of the 4 quarters from each quarter and then add the annual value to each quarter. In the second case we subtract the mean of the last 3 quarters from the last 3 quarters and add the annual.
In each case averaging the z_q_adj
values over the four quarters of a year will recover the original annual value.
Here are the two approaches mentioned:
# 1
yr <- format(time(c), "%Y")
c$z_q_adj <- ave(coredata(c$z_q), yr, FUN = function(x) x - mean(x) + x[1])
giving:
> c
z_a z_q z_q_adj
2000-01-01 100 100.0000 95.36604
2000-04-01 NA 103.4434 98.80946
2000-07-01 NA 106.4080 101.77405
2000-10-01 NA 108.6844 104.05046
2001-01-01 110 110.0000 109.39295
2001-04-01 NA 110.5723 109.96527
2001-07-01 NA 110.8719 110.26484
2001-10-01 NA 110.9840 110.37694
2002-01-01 111 111.0000 110.86116
2002-04-01 NA 111.0150 110.87615
2002-07-01 NA 111.1219 110.98311
2002-10-01 NA 111.4184 111.27958
# 2
c$z_q_adj <- ave(coredata(c$z_q), yr, FUN = function(x) c(x[1], x[-1] - mean(x[-1]) +x[1]))
giving:
> c
z_a z_q z_q_adj
2000-01-01 100 100.0000 100.0000
2000-04-01 NA 103.4434 97.2648
2000-07-01 NA 106.4080 100.2294
2000-10-01 NA 108.6844 102.5058
2001-01-01 110 110.0000 110.0000
2001-04-01 NA 110.5723 109.7629
2001-07-01 NA 110.8719 110.0625
2001-10-01 NA 110.9840 110.1746
2002-01-01 111 111.0000 111.0000
2002-04-01 NA 111.0150 110.8299
2002-07-01 NA 111.1219 110.9368
2002-10-01 NA 111.4184 111.2333
ADDED If you want to know whether a series was interpolated or not some approaches are:
add a comment to the series, e.g. comment(c) <- "Originally annual"
, or
use a naming convention, e.g. add _a
to the series name if it was
originally annual: c_a <- c
, or
if it's OK to retain both the c_q
and c_q_adj
columns then for series
that originated from quarterly data the two columns should be the
same and otherwise not, or
keep a column for both the original data and the quarterly data
Perhaps I'm missing something here, but assuming the annual value always comes from the first quarter, couldn't you just replace mean
in your aggregate
call with min
?
> d <- aggregate(c, as.integer(format(index(c),"%Y")), min, na.rm=TRUE)
> d
z_a z_q
2000 100 100
2001 110 110
2002 111 111