How to format data for plotly sunburst diagram

I'm trying to make an sunburst diagram using Plotly via R. I'm struggling with the data model required for the hierarchy, both in terms of conceptualizing how it works, and seeing if there are any easy ways to transform a regular dataframe, with columns representing different hierarchical levels, into the format needed.

I've looked at examples for plotly sunburst charts in R, e.g., here, and seen the reference page but don't totally get the model for data formatting.

# Create some fake data - say ownership and land use data with acreage
df <- data.frame(ownership=c(rep("private", 3), rep("public",3),rep("mixed", 3)), 
                 landuse=c(rep(c("residential", "recreation", "commercial"),3)),
                 acres=c(108,143,102, 300,320,500, 37,58,90))

# Just try some quick pie charts of acreage by landuse and ownership
plot_ly(data=df, labels= ~landuse, values= ~acres, type='pie')
plot_ly(data=df, labels= ~ownership, values= ~acres, type='pie')

# This doesn't render anything... not that I'd expect it to given the data format doesn't seem to match what's needed, 
# but this is what I'd intuitively expect to work
plot_ly(data=df, labels= ~landuse, parents = ~ownership, values= ~acres, type='sunburst')

It would be helpful, given the example code above, or similar, to see how one might go from the data (df) to the format required for the plotly sunburst diagram.

You are absolutely right, compared to the rest of the intuitiv usage of plotly's R API preparing data for a sunburst chart is rather annoying.

I had the same problem and wrote a function based on library(data.table) to prepare the data, accepting two different data.frame input formats.

The format required to generate a sunburst plot using data similarly structured as yours can be seen here under the section Sunburst with Repeated Labels.

For your example it should look like this:

         labels values         parents                           ids
 1:       total   1658            <NA>                         total
 2:     private    353           total               total - private
 3:      public   1120           total                total - public
 4:       mixed    185           total                 total - mixed
 5: residential    108 total - private total - private - residential
 6:  recreation    143 total - private  total - private - recreation
 7:  commercial    102 total - private  total - private - commercial
 8: residential    300  total - public  total - public - residential
 9:  recreation    320  total - public   total - public - recreation
10:  commercial    500  total - public   total - public - commercial
11: residential     37   total - mixed   total - mixed - residential
12:  recreation     58   total - mixed    total - mixed - recreation
13:  commercial     90   total - mixed    total - mixed - commercial

Here is the code to get there:

library(data.table)
library(plotly)

DF <- data.table(ownership=c(rep("private", 3), rep("public",3),rep("mixed", 3)),
                  landuse=c(rep(c("residential", "recreation", "commercial"),3)),
                  acres=c(108, 143, 102, 300, 320, 500, 37, 58, 90))

as.sunburstDF <- function(DF, valueCol = NULL){
  require(data.table)

  DT <- data.table(DF, stringsAsFactors = FALSE)
  DT[, root := "total"]
  setcolorder(DT, c("root", names(DF)))

  hierarchyList <- list()
  if(!is.null(valueCol)){setnames(DT, valueCol, "values", skip_absent=TRUE)}
  hierarchyCols <- setdiff(names(DT), "values")

  for(i in seq_along(hierarchyCols)){
    currentCols <- names(DT)[1:i]
    if(is.null(valueCol)){
      currentDT <- unique(DT[, ..currentCols][, values := .N, by = currentCols], by = currentCols)
    } else {
      currentDT <- DT[, lapply(.SD, sum, na.rm = TRUE), by=currentCols, .SDcols = "values"]
    }
    setnames(currentDT, length(currentCols), "labels")
    hierarchyList[[i]] <- currentDT
  }

  hierarchyDT <- rbindlist(hierarchyList, use.names = TRUE, fill = TRUE)

  parentCols <- setdiff(names(hierarchyDT), c("labels", "values", valueCol))
  hierarchyDT[, parents := apply(.SD, 1, function(x){fifelse(all(is.na(x)), yes = NA_character_, no = paste(x[!is.na(x)], sep = ":", collapse = " - "))}), .SDcols = parentCols]
  hierarchyDT[, ids := apply(.SD, 1, function(x){paste(x[!is.na(x)], collapse = " - ")}), .SDcols = c("parents", "labels")]
  hierarchyDT[, c(parentCols) := NULL]
  return(hierarchyDT)
}

sunburstDF <- as.sunburstDF(DF, valueCol = "acres")

plot_ly(data = sunburstDF, ids = ~ids, labels= ~labels, parents = ~parents, values= ~values, type='sunburst', branchvalues = 'total')

Here is an example for the second data.frame format accepted by the function (valueCol = NULL, because it is calculated from the data):

DF2 <- data.frame(sample(LETTERS[1:3], 100, replace = TRUE),
                 sample(LETTERS[4:6], 100, replace = TRUE),
                 sample(LETTERS[7:9], 100, replace = TRUE),
                 sample(LETTERS[10:12], 100, replace = TRUE),
                 sample(LETTERS[13:15], 100, replace = TRUE),
                 stringsAsFactors = FALSE)

plot_ly(data = as.sunburstDF(DF2), ids = ~ids, labels= ~labels, parents = ~parents, values= ~values, type='sunburst', branchvalues = 'total')

Please also see library(sunburstR) as an alternative.

来源：https://stackoverflow.com/questions/57395424/how-to-format-data-for-plotly-sunburst-diagram

标签

plotly

r-plotly

sunburst-diagram