Why does rbindlist not respect column names?

前端 未结 1 1720
灰色年华
灰色年华 2021-01-17 21:46

I just discovered this bug, only to find that some people are calling it a \"feature\". This makes rbindlist NOT like do.call(\"rbind\",l) as

相关标签:
1条回答
  • 2021-01-17 22:23

    This feature is now implemented in commit 1266 of v1.9.3. From NEWS:

    o  'rbindlist' gains 'use.names' and 'fill' arguments and is now implemented 
       entirely in C. Closes #5249    
      -> use.names by default is FALSE for backwards compatibility (doesn't bind by 
         names by default)
      -> rbind(...) now just calls rbindlist() internally, except that 'use.names' 
         is TRUE by default, for compatibility with base (and backwards compatibility).
      -> fill by default is FALSE. If fill is TRUE, use.names has to be TRUE.
      -> At least one item of the input list has to have non-null column names.
      -> Duplicate columns are bound in the order of occurrence, like base.
      -> Attributes that might exist in individual items would be lost in the bound result.
      -> Columns are coerced to the highest SEXPTYPE, if they are different, if/when possible.
      -> And incredibly fast ;).
      -> Documentation updated in much detail. Closes DR #5158.
    

    With this, you can set use.names=TRUE to bind by names. It's set to FALSE by default for backwards compatibility. Alternatively, you can use rbind(..) where use.names=TRUE, again for backwards compatibility.

    See this post for more examples and this post for benchmarks.

    Examples:

    1) Just set use.names=TRUE

    DT1 <- data.table(x=1, y=2)
    DT2 <- data.table(y=1, x=2)
    
    rbindlist(list(DT1,DT2), use.names=TRUE, fill=FALSE)
    #    x y
    # 1: 1 2
    # 2: 2 1
    
    DT1 <- data.table(x=1, y=2)
    DT2 <- data.table(z=2, y=1)
    
    # returns error when fill=FALSE but can't be bound without fill=TRUE
    rbindlist(list(DT1, DT2), use.names=TRUE, fill=FALSE)
    # Error in rbindlist(list(DT1, DT2), use.names = TRUE, fill = FALSE) : 
        # Answer requires 3 columns whereas one or more item(s) in the input 
        # list has only 2 columns. ...
    

    2) Also binds duplicate column names in the order of occurrence:

    DT1 <- data.table(x=1, x=2, y=10, y=20, y=30)
    DT2 <- data.table(y=-10, x=-2, y=-20, x=-1, y=-30)
    
    rbindlist(list(DT1,DT2), use.names=TRUE)
    
    #     x  x   y   y   y
    # 1:  1  2  10  20  30
    # 2: -2 -1 -10 -20 -30
    

    3) use fill=TRUE if you want to bind by names and fill missing columns

    DT1 <- data.table(x=1, y=2)
    DT2 <- data.table(y=2, z=-1)
    
    rbindlist(list(DT1, DT2), fill=TRUE)
    #     x y  z
    # 1:  1 2 NA
    # 2: NA 2 -1
    

    HTH

    0 讨论(0)
提交回复
热议问题