How to create “NA” for missing data in a time series

后端未结

关注

 4  1197

I have several files of data that look like this:

X code year month day pp  
1 4515 1953     6   1  0  
2 4515 1953     6   2  0  
3 4515 1953     6   3  0  
4


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情深已故        
                
              
                            
                2021-01-30 10:25
              
            
            
                                                                       
I had to deal with the similar problem with a monthly time series. I did it with directly joining two data.table/data.frame by the time variable. My point is that time series is also a kind of datasets. So you can also manipulate any time series as regular dataset in a regular way. Here is my solution:

library(zoo)    
(full <- data.table(yrAndMo = as.yearmon(seq(as.Date('2008-01-01'), by = '1 month', length = someLength)))) 
# the full time horizon that you want to have
#  yrAndMo
#  1: Jan 2008
#  2: Feb 2008
#  3: Mar 2008
#  4: Apr 2008
#  5: May 2008
# ---         
# 98: Feb 2016
# 99: Mar 2016
# 100: Apr 2016
# 101: May 2016
# 102: Jun 2016

exampleDat # the actually data you want to append to the full time horizon
# yrAndMo someValue
# 1 Mar 2010      7500
# 2 Jun 2010      1115
# 3 Mar 2011      2726
# 4 Apr 2011      1865
# 5 Nov 2011      1695
# 6 Dec 2012     10000
# 7 Mar 2016      1000

library(plyr)
join(full, exampleDat, by = 'yrAndMo', type = "left")
#   yrAndMo someValue
#   1: Jan 2008        NA
#   2: Feb 2008        NA
#   3: Mar 2008        NA
#   4: Apr 2008        NA
#   5: May 2008        NA
#  ---                   
#  98: Feb 2016        NA
#  99: Mar 2016      1000
# 100: Apr 2016        NA
# 101: May 2016        NA
# 102: Jun 2016        NA


after this you can easily change the class of the dataset back to any type of time series that you want to have. I preferred read.zoo.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  生来不讨喜        
                
              
                            
                2021-01-30 10:28
              
            
            
                                                                       
The seq function has some interesting features that you can use to easily generate a complete sequence of dates.  For example, the following code can be used to generate a sequence of dates starting on April 25:

Edit: This feature is documented in ?seq.Date

start = as.Date("2011/04/25")
full <- seq(start, by='1 day', length=15)
full

 [1] "2011-04-25" "2011-04-26" "2011-04-27" "2011-04-28" "2011-04-29"
 [6] "2011-04-30" "2011-05-01" "2011-05-02" "2011-05-03" "2011-05-04"
[11] "2011-05-05" "2011-05-06" "2011-05-07" "2011-05-08" "2011-05-09"


Now use the same principle to generate some data with "missing" rows, by generating the sequence for every 2nd day:

partial <- data.frame(
    date=seq(start, by='2 day', length=6),
    value=1:6
)
partial

        date value
1 2011-04-25     1
2 2011-04-27     2
3 2011-04-29     3
4 2011-05-01     4
5 2011-05-03     5
6 2011-05-05     6


To answer your question, one can use vector subscripting or the match function to create a dataset with NAs:

with(partial, value[match(full, date)])
 [1]  1 NA  2 NA  3 NA  4 NA  5 NA  6 NA NA NA NA


To combine this result with the original full data:

data.frame(Date=full, value=with(partial, value[match(full, date)]))
         Date value
1  2011-04-25     1
2  2011-04-26    NA
3  2011-04-27     2
4  2011-04-28    NA
5  2011-04-29     3
6  2011-04-30    NA
7  2011-05-01     4
8  2011-05-02    NA
9  2011-05-03     5
10 2011-05-04    NA
11 2011-05-05     6
12 2011-05-06    NA
13 2011-05-07    NA
14 2011-05-08    NA
15 2011-05-09    NA

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  天涯浪人        
                
              
                            
                2021-01-30 10:33
              
            
            
                                                                       
The first thing to note is that z.date is character, not Date.

Here's how I would solve your problem using xts (a subclass of zoo).

# remove the third obs from sample data
CET <- CET[-3,]
# create an actual Date column in CET
CET$date <- as.Date(with(CET, paste(year, month, day, sep="-")))
# create an xts object using 'date' column
x <- xts(CET[,c("code","pp")], CET$date)
# now merge 'x' with a regular date sequence spanning the start/end of 'x'
X <- merge(x, timeBasedSeq(paste(start(x), end(x), sep="::")))
X
#            code  pp
# 1953-06-01 4515 0.0
# 1953-06-02 4515 0.0
# 1953-06-03   NA  NA
# 1953-06-04 4515 0.0
# 1953-06-05 4515 3.5

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2021-01-30 10:35
              
            
            
                                                                       
In the zoo package "regular" means that the series is equally spaced except possibly for some missing entries.  The zooreg class in the zoo package is specifically for that type of series.  Note that the set of all regular series includes the set of all equally spaced series but is strictly larger.

The is.regular function checks whether a given series is regular.  That is, is the series amenable to making it equally spaced if one inserted NAs for the missing entries?

Regarding your last question, its a FAQ.  See FAQ #13 in the zoo FAQ available from the zoo CRAN page or from within R via:

vignette("zoo-faq") 


Also in FAQ #13 there is some illustrative code.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复