Merge data frames whilst summing common columns in R

后端 未结 3 602
星月不相逢
星月不相逢 2021-01-06 00:00

My problem is very similar to the one posted here.

The difference is that they knew the columns that would be conflicting whereas I need a generic method that wont k

3条回答
  •  离开以前
    2021-01-06 00:21

    A data.table solution:

    dt1 <- data.table(read.table(header=T, text="Date             Time    ColumnA    ColumnB
    01/01/2013      08:00      10         30
    01/01/2013      08:30      15         25
    01/01/2013      09:00      20         20
    02/01/2013      08:00      25         15
    02/01/2013      08:30      30         10
    02/01/2013      09:00      35         5"))
    
    dt2 <- data.table(read.table(header=T, text="Date           ColumnA    ColumnB    ColumnC
    01/01/2013      100        300         1
    02/01/2013      200        400         2"))
    
    setkey(dt1, "Date")
    setkey(dt2, "Date")
    # Note: The ColumnC assignment has to be come before the summing operations
    # Else it gives out error (see below)
    dt1[dt2, `:=`(ColumnC = i.ColumnC, ColumnA = ColumnA + i.ColumnA, 
                            ColumnB = ColumnB + i.ColumnB)]
    
    #          Date  Time ColumnA ColumnB ColumnC
    # 1: 01/01/2013 08:00     110     330       1
    # 2: 01/01/2013 08:30     115     325       1
    # 3: 01/01/2013 09:00     120     320       1
    # 4: 02/01/2013 08:00     225     415       2
    # 5: 02/01/2013 08:30     230     410       2
    # 6: 02/01/2013 09:00     235     405       2
    

    I'm not sure why placing ColumnC assignment on the right end throws this error. Perhaps MatthewDowle could explain the cause for this error.

    dt1[dt2, `:=`(ColumnA = ColumnA + i.ColumnA, ColumnB = ColumnB + i.ColumnB, 
                            ColumnC = i.ColumnC)]
    
    Error in `[.data.table`(dt1, dt2, `:=`(ColumnA = ColumnA + i.ColumnA,  : 
      Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'NULL'
    

    Update from v1.8.9 :

    o Mixing adding new with updating existing columns into one :=() by group; i.e.,
    DT[,:=(existingCol=...,newCol=...), by=...]
    now works without error or segfault, #2778 and #2528. Many thanks to Arun for reporting both with reproducible examples. Tests added.

提交回复
热议问题