Dataframe within dataframe?

后端未结

关注

 3  1446

Consider this example:

df <- data.frame(id=1:10,var1=LETTERS[1:10],var2=LETTERS[6:15])

fun.split <- function(x) tolower(as.character(x))
df$new.letter


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  离开以前        
                
              
                            
                2021-01-04 02:58
              
            
            
                                                                       
The reason is because you assigned a single new column to a 2 column matrix output by apply.  So, the result will be a matrix in a single column.  You can convert it back to normal data.frame with

 do.call(data.frame, df)


A more straightforward method will be to assign 2 columns and I use lapply instead of apply as there can be cases where the columns are of different classes.  apply returns a matrix and with mixed class, the columns will be 'character' class.  But, lapply gets the output in a list and preserves the class

df[paste0('new.letters', names(df)[2:3])] <- lapply(df[2:3], fun.split)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2021-01-04 02:59
              
            
            
                                                                       
In this case R doesn't behave like one would expect but maybe if we dig deeper we can solve it. What is a data frame? as Norman Matloff says in his book (chapter 5):


  a data frame is a list, with the components of that list being
  equal-length vectors


The following code might be useful to understand.

class(df$new.letters)
[1] "matrix"


str(df)
'data.frame':   10 obs. of  4 variables:
 $ id         : int  1 2 3 4 5 6 7 8 9 10
 $ var1       : Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
 $ var2       : Factor w/ 10 levels "F","G","H","I",..: 1 2 3 4 5 6 7 8 9 10
 $ new.letters: chr [1:10, 1:2] "a" "b" "c" "d" ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "var1" "var2"


Maybe the reason why it looks strange is in the print methods. Consider this:

colnames(df$new.letters)
[1] "var1" "var2"


maybe there must something in the print methods that combine the sub-names of objects and display them all.

For example here the vectors that constitute the df are:

names(df)
[1] "id"          "var1"        "var2"        "new.letters"


but in this case the vector new.letters also has a dim attributes (in fact it is a matrix) were dimensions have names var1 and var1 too. See this code:

attributes(df$new.letters)
$dim
[1] 10  2

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] "var1" "var2"


but when we print we see all of them like they were separated vectors (and so columns of the data.frame!).

Edit: Print methods

Just for curiosity in order to improve this question I looked inside the methods of the print functions:

methods(print)


The previous code produces a very long list of methods for the generic function print but there is no one for data.frame. The one that looks for data frame (but I am sure there is a more technically way to find out that) is  listof.

getS3method("print", "listof")
function (x, ...) 
{
    nn <- names(x)
    ll <- length(x)
    if (length(nn) != ll) 
        nn <- paste("Component", seq.int(ll))
    for (i in seq_len(ll)) {
        cat(nn[i], ":\n")
        print(x[[i]], ...)
        cat("\n")
    }
    invisible(x)
}
<bytecode: 0x101afe1c8>
<environment: namespace:base>


Maybe I am wrong but It seems to me that in this code there might be useful informations about why that happens, specifically when the if (length(nn) != ll) is stated.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长发绾君心        
                
              
                            
                2021-01-04 03:05
              
            
            
                                                                       
@akrun solved 90% of my problem. But I had data.frames buried within data.frames, buried within data.frames and so on, without knowing the depth to which this was happening.

In this case, I thought sharing my recursive solution might be helpful to others searching this thread as I was:

    unnest_dataframes <- function(x) {

        y <- do.call(data.frame, x)

        if("data.frame" %in% sapply(y, class)) unnest_dataframes(y)

        y

    }

    new_data <- unnest_dataframes(df)


Although this itself sometimes has problems and it can be helpful to separate all columns of class "data.frame" from the original data set then cbind() it back together like so:

  # Find all columns that are data.frame
  # Assuming your data frame is stored in variable 'y'
  data.frame.cols <- unname(sapply(y, function(x) class(x) == "data.frame"))
  z <- y[, !data.frame.cols]

  # All columns of class "data.frame"
  dfs <- y[, data.frame.cols]

  # Recursively unnest each of these columns
  unnest_dataframes <- function(x) {
    y <- do.call(data.frame, x)
    if("data.frame" %in% sapply(y, class)) {
        unnest_dataframes(y)
    } else {
        cat('Nested data.frames successfully unpacked\n')
      }
    y
  }

  df2 <- unnest_dataframes(dfs)

  # Combine with original data
  all_columns <- cbind(z, df2)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复