Fill in missing year in ordered list of dates

天大地大妈咪最大 提交于 2019-12-07 06:34:51

问题


I have collected some time series data from the web and the timestamp that I got looks like below.

24 Jun 
21 Mar
20 Jan 
10 Dec
20 Jun 
20 Jan
10 Dec 
...

The interesting part is that the year is missing in the data, however, all the records are ordered, and you can infer the year from the record and fill in the missing data. So the data after imputing should be like this:

24 Jun 2014
21 Mar 2014
20 Jan 2014
10 Dec 2013 
20 Jun 2013
20 Jan 2013
10 Dec 2012
...

Before lifting my sleeves and start writing a for loop with nested logic.. is there a easy way that might work out of box in R to impute the missing year.

Thanks a lot for any suggestion!


回答1:


Here's one idea

## Make data easily reproducible
df <- data.frame(day=c(24, 21, 20, 10, 20, 20, 10),
                 month = c("Jun", "Mar", "Jan", "Dec", "Jun", "Jan", "Dec"))


## Convert each month-day combo to its corresponding "julian date"
datestring <- paste("2012", match(df[[2]], month.abb), df[[1]], sep = "-")
date <- strptime(datestring, format = "%Y-%m-%d") 
julian <- as.integer(strftime(date, format = "%j"))

## Transitions between years occur wherever julian date increases between
## two observations
df$year <- 2014 - cumsum(diff(c(julian[1], julian))>0)

## Check that it worked
df
#   day month year
# 1  24   Jun 2014
# 2  21   Mar 2014
# 3  20   Jan 2014
# 4  10   Dec 2013
# 5  20   Jun 2013
# 6  20   Jan 2013
# 7  10   Dec 2012



回答2:


The OP has requested to complete the years in descending order starting in 2014.

Here is an alternative approach which works without date conversion and fake dates. Furthermore, this approach can be modified to work with fiscal years which start on a different month than January.

# create sample dataset
df <- data.frame(
  day = c(24L, 21L, 20L, 10L, 20L, 20L, 21L, 10L, 30L, 10L, 10L, 7L),
  month = c("Jun", "Mar", "Jan", "Dec", "Jun", "Jan", "Jan", "Dec", "Jan", 
            "Jan", "Jan", "Jun"))

df$year <- 2014 - cumsum(c(0L, diff(100L*as.integer(
  factor(df$month, levels = month.abb)) + df$day) > 0))
df
   day month year
1   24   Jun 2014
2   21   Mar 2014
3   20   Jan 2014
4   10   Dec 2013
5   20   Jun 2013
6   20   Jan 2013
7   21   Jan 2012
8   10   Dec 2011
9   30   Jan 2011
10  10   Jan 2011
11  10   Jan 2011
12   7   Jun 2010

Completion of fiscal years

Let's assume the business has decided to start its fiscal year on February 1. Thus, January lies in a different fiscal year than February or March of the same calendar year.

To handle fiscal years, we only need to shuffle the factor levels accordingly:

df$fy <- 2014 - cumsum(c(0L, diff(100L*as.integer(
  factor(df$month, levels = month.abb[c(2:12, 1)])) + df$day) > 0))
df
   day month year   fy
1   24   Jun 2014 2014
2   21   Mar 2014 2014
3   20   Jan 2014 2013
4   10   Dec 2013 2013
5   20   Jun 2013 2013
6   20   Jan 2013 2012
7   21   Jan 2012 2011
8   10   Dec 2011 2011
9   30   Jan 2011 2010
10  10   Jan 2011 2010
11  10   Jan 2011 2010
12   7   Jun 2010 2010


来源:https://stackoverflow.com/questions/25632652/fill-in-missing-year-in-ordered-list-of-dates

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!