Merge Panel data to get balanced panel data

前端 未结 2 664
夕颜
夕颜 2020-11-30 14:22

I have several data frames in panel data form. Now I want to merge these panel data frames into one panel data. These data frames have common and different between them. I i

相关标签:
2条回答
  • 2020-11-30 14:41

    Two alternative possibilities of which especially the data.table altenative(s) are of interest when speed and memory are an issue:

    base R :

    Bind the dataframes together into one:

    df3 <- rbind(df1,df2)
    

    Create a reference dataframe with all possible combinations of Month and variable with expand.grid:

    ref <- expand.grid(Month = unique(df3$Month), variable = unique(df3$variable))
    

    Merge them together with all.x=TRUE so you make sure the missing combinations are filled with NA-values:

    merge(ref, df3, by = c("Month", "variable"), all.x = TRUE)
    

    Or (thanx to @PierreLafortune):

    merge(ref, df3, by=1:2, all.x = TRUE)
    

    data.table :

    Bind the dataframes into one with 'rbindlist' which returns a 'data.table':

    library(data.table)
    DT <- rbindlist(list(df1,df2))
    

    Join with a reference to ensure all combinations are present and missing ones are filled with NA:

    DT[CJ(Month, variable, unique = TRUE), on = c(Month="V1", variable="V2")]
    

    Everything together in one call:

    DT <- rbindlist(list(df1,df2))[CJ(Month, variable, unique = TRUE), on = c(Month="V1", variable="V2")]
    

    An alternative is wrapping rbindlist in setkey and then expanding with CJ (cross join):

    DT <- setkey(rbindlist(list(df1,df2)), Month, variable)[CJ(Month, variable, unique = TRUE)]
    
    0 讨论(0)
  • 2020-11-30 14:54

    There's a function for that. Combine the data frames with rbind. Then use complete. It will look through the groups in variable and fill any with missing values:

    library(tidyr)
    df3 <- do.call(rbind.data.frame, list(df1, df2))
    df3$Month <- as.character(df3$Month)
    df4 <- complete(df3, Month, variable)
    df4$Month <- as.yearmon(df4$Month, "%b %Y")
    df5 <- df4[order(df4$variable,df4$Month),]
    df5
    # Source: local data frame [72 x 8]
    # 
    #       Month variable Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
    #      (yrmn)   (fctr) (int) (int) (int) (int) (int) (int)
    # 1  Jan 2005        A     1     2     3     4     5     6
    # 2  Feb 2005        A     2     3     4     5     6     7
    # 3  Mar 2005        A     3     4     5     6     7     8
    # 4  Apr 2005        A     4     5     6     7     8     9
    # 5  May 2005        A     5     6     7     8     9    10
    # 6  Jun 2005        A     6     7     8     9    10    11
    # 7  Jul 2005        A     7     8     9    10    11    12
    # 8  Aug 2005        A     8     9    10    11    12    13
    # 9  Sep 2005        A     9    10    11    12    13    14
    # 10 Oct 2005        A    10    11    12    13    14    15
    # ..      ...      ...   ...   ...   ...   ...   ...   ...
    

    An alternative implementation with dplyr & tidyr:

    library(dplyr)
    library(tidyr)
    
    df3 <- bind_rows(df1, df2) %>% 
      complete(Month, variable)
    
    0 讨论(0)
提交回复
热议问题