Recursively ensuring tibbles instead of data frames when parsing/manipulating nested JSON

前端 未结 1 797
庸人自扰
庸人自扰 2021-01-22 16:29

I have to deal with JSON documents that contain nested documents and at some level have an array which in turn contains individual docu

相关标签:
1条回答
  • 2021-01-22 17:05

    I guess you're going to have to use recursion to go through the list. Here's an idea I had, but I could only get it to work with fromJSON from the rjson package rather than the jsonlite package.

    The first step is to define a recursive function to check the depth of a list element:

    depth <- function(list_entry)
    {
      if (is.list(list_entry) & !is.tibble(list_entry)) 
          return(max(sapply(list_entry, depth)) + 1)
      else 
          return(0)
    }
    

    The next function recursively tries to make a tibble out of depth-1 elements (if they are vectors) or out of depth-2 elements (if the tibble values are listed individually). If it finds a depth-0 element it will return it unchanged, and if the element is > 2 deep or not suitable to turn into a tibble, it will pass the children nodes recursively for the same treatment.

    recursive_tibble <- function(json_list)
    {
      lapply(json_list, function(y)
      {
        if(depth(y) == 0)
          return(unlist(y))
    
        if(depth(y) == 1)
        {
            if (length(y) < 2) 
              return(unlist(y))
    
            if (length(unique(names(y))) == 1)
              return(as_tibble(do.call("rbind", lapply(y, unlist))))
    
            if (length(unique(unlist(lapply(y, length)))) == 1)
              return(as_tibble(do.call("cbind", lapply(y, unlist))))
    
            else return(unlist(y))
        }
    
        if (depth(y) == 2)
        {
            if (length(y) < 2) 
              return(recursive_tibble(y))
    
            if (all(do.call(`==`, lapply(y, names))))     
              return(as_tibble(do.call("rbind", lapply(y, unlist))))
        }
    
        else return(recursive_tibble(y))
      })
    }
    

    So now you can do:

    recursive_tibble(x)
    #> List of 2
    #>  $ :List of 5
    #>   ..$ _id      : chr "1234"
    #>   ..$ createdAt: chr "2020-01-13 09:00:00"
    #>   ..$ labels   : chr [1:2] "label-a" "label-b"
    #>   ..$ levelOne :List of 1
    #>   .. ..$ levelTwo:List of 1
    #>   .. .. ..$ levelThree:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
    #>   .. .. .. ..$ x: chr [1:2] "A" "B"
    #>   .. .. .. ..$ y: chr [1:2] "1" "2"
    #>   .. .. .. ..$ z: chr [1:2] "TRUE" "FALSE"
    #>   ..$ schema   : chr "0.0.1"
    #>  $ :List of 5
    #>   ..$ _id      : chr "5678"
    #>   ..$ createdAt: chr "2020-01-13 09:01:00"
    #>   ..$ labels   : chr [1:2] "label-a" "label-b"
    #>   ..$ levelOne :List of 1
    #>   .. ..$ levelTwo:List of 1
    #>   .. .. ..$ levelThree:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
    #>   .. .. .. ..$ x: chr [1:2] "A" "B"
    #>   .. .. .. ..$ y: chr [1:2] "1" "2"
    #>   .. .. .. ..$ z: chr [1:2] "TRUE" "FALSE"
    #>   ..$ schema   : chr "0.0.1"
    
    
    
    0 讨论(0)
提交回复
热议问题