I have to deal with JSON documents that contain nested documents and at some level have an array which in turn contains individual docu
I guess you're going to have to use recursion to go through the list. Here's an idea I had, but I could only get it to work with fromJSON
from the rjson package rather than the jsonlite package.
The first step is to define a recursive function to check the depth of a list element:
depth <- function(list_entry)
{
if (is.list(list_entry) & !is.tibble(list_entry))
return(max(sapply(list_entry, depth)) + 1)
else
return(0)
}
The next function recursively tries to make a tibble out of depth-1 elements (if they are vectors) or out of depth-2 elements (if the tibble values are listed individually). If it finds a depth-0 element it will return it unchanged, and if the element is > 2 deep or not suitable to turn into a tibble, it will pass the children nodes recursively for the same treatment.
recursive_tibble <- function(json_list)
{
lapply(json_list, function(y)
{
if(depth(y) == 0)
return(unlist(y))
if(depth(y) == 1)
{
if (length(y) < 2)
return(unlist(y))
if (length(unique(names(y))) == 1)
return(as_tibble(do.call("rbind", lapply(y, unlist))))
if (length(unique(unlist(lapply(y, length)))) == 1)
return(as_tibble(do.call("cbind", lapply(y, unlist))))
else return(unlist(y))
}
if (depth(y) == 2)
{
if (length(y) < 2)
return(recursive_tibble(y))
if (all(do.call(`==`, lapply(y, names))))
return(as_tibble(do.call("rbind", lapply(y, unlist))))
}
else return(recursive_tibble(y))
})
}
So now you can do:
recursive_tibble(x)
#> List of 2
#> $ :List of 5
#> ..$ _id : chr "1234"
#> ..$ createdAt: chr "2020-01-13 09:00:00"
#> ..$ labels : chr [1:2] "label-a" "label-b"
#> ..$ levelOne :List of 1
#> .. ..$ levelTwo:List of 1
#> .. .. ..$ levelThree:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
#> .. .. .. ..$ x: chr [1:2] "A" "B"
#> .. .. .. ..$ y: chr [1:2] "1" "2"
#> .. .. .. ..$ z: chr [1:2] "TRUE" "FALSE"
#> ..$ schema : chr "0.0.1"
#> $ :List of 5
#> ..$ _id : chr "5678"
#> ..$ createdAt: chr "2020-01-13 09:01:00"
#> ..$ labels : chr [1:2] "label-a" "label-b"
#> ..$ levelOne :List of 1
#> .. ..$ levelTwo:List of 1
#> .. .. ..$ levelThree:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
#> .. .. .. ..$ x: chr [1:2] "A" "B"
#> .. .. .. ..$ y: chr [1:2] "1" "2"
#> .. .. .. ..$ z: chr [1:2] "TRUE" "FALSE"
#> ..$ schema : chr "0.0.1"