Difference between as.data.frame(x) and data.frame(x)

前端 未结 6 1038
再見小時候
再見小時候 2020-12-24 01:44

What is the difference between as.data.frame(x) and data.frame(x)

In this following example, the result is the same at the exception of the columns names.

         


        
相关标签:
6条回答
  • 2020-12-24 01:58

    Looking at the code, as.data.frame fails faster. data.frame will issue warnings, and do things like remove rownames if there are duplicates:

    > x <- matrix(data=rep(1,9),nrow=3,ncol=3)
    > rownames(x) <- c("a", "b", "b")
    > data.frame(x)
      X1 X2 X3
    1  1  1  1
    2  1  1  1
    3  1  1  1
    Warning message:
    In data.row.names(row.names, rowsi, i) :
      some row.names duplicated: 3 --> row.names NOT used
    
    > as.data.frame(x)
    Error in (function (..., row.names = NULL, check.rows = FALSE, check.names =        
    TRUE,  : 
      duplicate row.names: b
    
    0 讨论(0)
  • 2020-12-24 01:59

    Try

    colnames(x) <- c("C1","C2","C3")
    

    and then both will give the same result

    identical(data.frame(x), as.data.frame(x))
    

    What is more startling are things like the following:

    list(x)
    

    Provides a one-elemnt list, the elemnt being the matrix x; whereas

    as.list(x)
    

    gives a list with 9 elements, one for each matrix entry

    MM

    0 讨论(0)
  • 2020-12-24 02:12

    The difference becomes clearer when you look at their main arguments:

    • as.data.frame(x, ...): check if object is a data frame, or coerce if possible. Here, "x" can be any R object.
    • data.frame(...): build a data frame. Here, "..." allows specifying all the components (i.e. the variables of the data frame).

    So, the results by Ophelia are similar since both functions received a single matrix as argument: however, when these functions receive 2 (or more) vectors, the distinction becomes clearer:

    > # Set seed for reproducibility
    > set.seed(3)
    
    > # Create one int vector
    > IDs <- seq(1:10)
    > IDs
     [1]  1  2  3  4  5  6  7  8  9 10
    > # Create one char vector
    > types <- sample(c("A", "B"), 10, replace = TRUE)
    > types
     [1] "A" "B" "A" "A" "B" "B" "A" "A" "B" "B"
    
    > # Try to use "as.data.frame" to coerce components into a dataframe
    > dataframe_1 <- as.data.frame(IDs, types)
    > # Look at the result
    > dataframe_1
    Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
      duplicate row.names: A, B
    > # Inspect result with head
    > head(dataframe_1, n = 10)
        IDs
    A     1
    B     2
    A.1   3
    A.2   4
    B.1   5
    B.2   6
    A.3   7
    A.4   8
    B.3   9
    B.4  10
    > # Check the structure
    > str(dataframe_1)
    'data.frame':   10 obs. of  1 variable:
     $ IDs: int  1 2 3 4 5 6 7 8 9 10
    
    > # Use instead "data.frame" to build a data frame starting from two components
    > dataframe_2 <- data.frame(IDs, types) 
    > # Look at the result
    > dataframe_2
       IDs types
    1    1     A
    2    2     B
    3    3     A
    4    4     A
    5    5     B
    6    6     B
    7    7     A
    8    8     A
    9    9     B
    10  10     B
    > # Inspect result with head
    > head(dataframe_2, n = 10)
       IDs types
    1    1     A
    2    2     B
    3    3     A
    4    4     A
    5    5     B
    6    6     B
    7    7     A
    8    8     A
    9    9     B
    10  10     B
    > # Check the structure
    > str(dataframe_2)
    'data.frame':   10 obs. of  2 variables:
     $ IDs  : int  1 2 3 4 5 6 7 8 9 10
     $ types: Factor w/ 2 levels "A","B": 1 2 1 1 2 2 1 1 2 2
    

    As you see "data.frame()" works fine, while "as.data.frame()" produces an error as it recognises the first argument as the object to be checked and coerced.

    To sum up, "as.data.frame()" should be used to convert/coerce one single R object into a data frame (as you correctly did using a matrix), while "data.frame()" to build a data frame from scratch.

    0 讨论(0)
  • 2020-12-24 02:14

    As you noted, the result does differ slightly, and this means that they are not exactly equal:

    identical(data.frame(x),as.data.frame(x))
    [1] FALSE
    

    So you might need to take care to be consistent in which one you use.

    But it is also worth noting that as.data.frame is faster:

    library(microbenchmark)
    microbenchmark(data.frame(x),as.data.frame(x))
    Unit: microseconds
                 expr    min     lq median      uq     max neval
        data.frame(x) 71.446 73.616  74.80 78.9445 146.442   100
     as.data.frame(x) 25.657 27.631  28.42 29.2100  93.155   100
    
    y <- matrix(1:1e6,1000,1000)
    microbenchmark(data.frame(y),as.data.frame(y))
    Unit: milliseconds
                 expr      min       lq   median       uq       max neval
        data.frame(y) 17.23943 19.63163 23.60193 41.07898 130.66005   100
     as.data.frame(y) 10.83469 12.56357 14.04929 34.68608  38.37435   100
    
    0 讨论(0)
  • 2020-12-24 02:16

    data.frame() can be used to build a data frame while as.data.frame() can only be used to coerce other object to a data frame.

    for example:

    # data.frame()
    df1 <- data.frame(matrix(1:12,3,4),1:3)
    
    # as.data.frame()
    df2 <- as.data.frame(matrix(1:12,3,4),1:3)
    
    df1
    #   X1 X2 X3 X4 X1.3
    # 1  1  4  7 10    1
    # 2  2  5  8 11    2
    # 3  3  6  9 12    3
    
    df2
    #   V1 V2 V3 V4
    # 1  1  4  7 10
    # 2  2  5  8 11
    # 3  3  6  9 12
    
    0 讨论(0)
  • 2020-12-24 02:19

    As mentioned by Jaap, data.frame() calls as.data.frame() but there's a reason for it:

    as.data.frame() is a method to coerce other objects to class data.frame. If you're writing your own package, you would store your method to convert an object of your_class under as.data.frame.your_class(). Here are just a few examples.

    methods(as.data.frame)
     [1] as.data.frame.AsIs            as.data.frame.Date           
     [3] as.data.frame.POSIXct         as.data.frame.POSIXlt        
     [5] as.data.frame.aovproj*        as.data.frame.array          
     [7] as.data.frame.character       as.data.frame.complex        
     [9] as.data.frame.data.frame      as.data.frame.default        
    [11] as.data.frame.difftime        as.data.frame.factor         
    [13] as.data.frame.ftable*         as.data.frame.integer        
    [15] as.data.frame.list            as.data.frame.logLik*        
    [17] as.data.frame.logical         as.data.frame.matrix         
    [19] as.data.frame.model.matrix    as.data.frame.numeric        
    [21] as.data.frame.numeric_version as.data.frame.ordered        
    [23] as.data.frame.raw             as.data.frame.table          
    [25] as.data.frame.ts              as.data.frame.vector         
    
       Non-visible functions are asterisked
    
    0 讨论(0)
提交回复
热议问题