I have collected some time series data from the web and the timestamp that I got looks like below.
24 Jun
21 Mar
20 Jan
10 Dec
20 Jun
20 Jan
10 Dec
...
The interesting part is that the year is missing in the data, however, all the records are ordered, and you can infer the year from the record and fill in the missing data. So the data after imputing should be like this:
24 Jun 2014
21 Mar 2014
20 Jan 2014
10 Dec 2013
20 Jun 2013
20 Jan 2013
10 Dec 2012
...
Before lifting my sleeves and start writing a for
loop with nested
logic.. is there a easy way that might work out of box in R to impute the missing year.
Thanks a lot for any suggestion!
Here's one idea
## Make data easily reproducible
df <- data.frame(day=c(24, 21, 20, 10, 20, 20, 10),
month = c("Jun", "Mar", "Jan", "Dec", "Jun", "Jan", "Dec"))
## Convert each month-day combo to its corresponding "julian date"
datestring <- paste("2012", match(df[[2]], month.abb), df[[1]], sep = "-")
date <- strptime(datestring, format = "%Y-%m-%d")
julian <- as.integer(strftime(date, format = "%j"))
## Transitions between years occur wherever julian date increases between
## two observations
df$year <- 2014 - cumsum(diff(c(julian[1], julian))>0)
## Check that it worked
df
# day month year
# 1 24 Jun 2014
# 2 21 Mar 2014
# 3 20 Jan 2014
# 4 10 Dec 2013
# 5 20 Jun 2013
# 6 20 Jan 2013
# 7 10 Dec 2012
The OP has requested to complete the years in descending order starting in 2014.
Here is an alternative approach which works without date conversion and fake dates. Furthermore, this approach can be modified to work with fiscal years which start on a different month than January.
# create sample dataset
df <- data.frame(
day = c(24L, 21L, 20L, 10L, 20L, 20L, 21L, 10L, 30L, 10L, 10L, 7L),
month = c("Jun", "Mar", "Jan", "Dec", "Jun", "Jan", "Jan", "Dec", "Jan",
"Jan", "Jan", "Jun"))
df$year <- 2014 - cumsum(c(0L, diff(100L*as.integer(
factor(df$month, levels = month.abb)) + df$day) > 0))
df
day month year 1 24 Jun 2014 2 21 Mar 2014 3 20 Jan 2014 4 10 Dec 2013 5 20 Jun 2013 6 20 Jan 2013 7 21 Jan 2012 8 10 Dec 2011 9 30 Jan 2011 10 10 Jan 2011 11 10 Jan 2011 12 7 Jun 2010
Completion of fiscal years
Let's assume the business has decided to start its fiscal year on February 1. Thus, January lies in a different fiscal year than February or March of the same calendar year.
To handle fiscal years, we only need to shuffle the factor levels accordingly:
df$fy <- 2014 - cumsum(c(0L, diff(100L*as.integer(
factor(df$month, levels = month.abb[c(2:12, 1)])) + df$day) > 0))
df
day month year fy 1 24 Jun 2014 2014 2 21 Mar 2014 2014 3 20 Jan 2014 2013 4 10 Dec 2013 2013 5 20 Jun 2013 2013 6 20 Jan 2013 2012 7 21 Jan 2012 2011 8 10 Dec 2011 2011 9 30 Jan 2011 2010 10 10 Jan 2011 2010 11 10 Jan 2011 2010 12 7 Jun 2010 2010
来源:https://stackoverflow.com/questions/25632652/fill-in-missing-year-in-ordered-list-of-dates