How to build a dendrogram from a directory tree?

后端 未结 3 551
伪装坚强ぢ
伪装坚强ぢ 2020-12-23 18:05

Given a root absolute directory path. How do I generate a dendrogram object of all path\'s below it so that I can visualize the directory tree with R?

Suppose the fo

相关标签:
3条回答
  • 2020-12-23 18:18

    Here's a possible approach to get what you originally asked for which is a system like tree. This will give a data.tree object that's pretty flexible and could be made to plot like you might want but it's not entirely clear to me what you want:

    path <- c(
        "root/a/some/file.R", 
        "root/a/another/file.R", 
        "root/a/another/cool/file.R", 
        "root/b/some/data.csv", 
        "root/b/more/data.csv"
    )
    
    
    library(data.tree); library(plyr)
    
    x <- lapply(strsplit(path, "/"), function(z) as.data.frame(t(z)))
    x <- rbind.fill(x)
    x$pathString <- apply(x, 1, function(x) paste(trimws(na.omit(x)), collapse="/"))
    (mytree <- data.tree::as.Node(x))
    
    1  root                  
    2   ¦--a                 
    3   ¦   ¦--some          
    4   ¦   ¦   °--file.R    
    5   ¦   °--another       
    6   ¦       ¦--file.R    
    7   ¦       °--cool      
    8   ¦           °--file.R
    9   °--b                 
    10      ¦--some          
    11      ¦   °--data.csv  
    12      °--more          
    13          °--data.csv  
    
    
    plot(mytree)
    

    You can get the parts you want (I think) but it'll require you to do the leg work and figure out conversion between data types in data.tree: https://cran.r-project.org/web/packages/data.tree/vignettes/data.tree.html#tree-conversion

    I use this approach in my pathr package's tree function when use.data.tree = TRUE https://github.com/trinker/pathr#tree

    EDIT Per@Luke's comment below...data.tree::as.Node takes a path directly:

    (mytree <- data.tree::as.Node(data.frame(pathString = path)))
    
                    levelName
    1  root2                 
    2   ¦--a                 
    3   ¦   ¦--some          
    4   ¦   ¦   °--file.R    
    5   ¦   °--another       
    6   ¦       ¦--file.R    
    7   ¦       °--cool      
    8   ¦           °--file.R
    9   °--b                 
    10      ¦--some          
    11      ¦   °--data.csv  
    12      °--more          
    13          °--data.csv  
    
    0 讨论(0)
  • 2020-12-23 18:18

    It's worth adding that excellent fs package offers dir_tree function that delivers this functionality to R in a very convenient manner.

    tmp_dir <- tempdir()
    # Create some directories
    for (i in 1:10) {
        dir.create(path = file.path(tmp_dir,
                                    basename(tempfile(pattern = "dir")),
                                    basename(tempfile(pattern = "sub_dir"))),
                   recursive = TRUE)
    }
    # Create directory tree
    fs::dir_tree(path = tmp_dir, recurse = TRUE)
    

    Results

    /tmp/RtmpEhB0ne
    ├── dir15213121dd5903
    │   └── sub_dir1521315a5425ba
    ├── dir152131227b086f
    │   └── sub_dir1521314255d96b
    ├── dir152131353e6603
    │   └── sub_dir1521315b52aeed
    ├── dir15213136870535
    │   └── sub_dir15213127b34f64
    ├── dir1521313bbf738b
    │   └── sub_dir152131473939ea
    ├── dir152131403f4fd5
    │   └── sub_dir152131115296e7
    ├── dir152131503d0d55
    │   └── sub_dir15213114368572
    ├── dir1521316f0bb0c3
    │   └── sub_dir1521314aea266b
    ├── dir1521317fe305e9
    │   └── sub_dir152131bcfe8a
    └── dir1521319800dfb
        └── sub_dir15213129defd4a
    

    In addition to printing directory tree, discovered paths can be returned to an object.

    sink(file = tempfile(fileext = ".log"))
    res_fs_tree <- fs::dir_tree(path = tmp_dir, recurse = TRUE)
    sink()
    res_fs_tree[[1]]
    # [1] "/tmp/RtmpEhB0ne/dir15213121dd5903/sub_dir1521315a5425ba"
    
    0 讨论(0)
  • 2020-12-23 18:18

    If you are on Windows, you can use my package dir2json, by installing it like this:

    drat::addRepo("stlarepo")
    install.packages("dir2json")
    

    It is also possible to use it on Linux, but there is a DLL linked to the GHC dynamic libraries, which must be installed on the system (while this DLL is standalone on Windows).

    > library(dir2json)
    > cat(dir2tree("src"))
    src
    |
    `- contrib
       |
       +- PACKAGES.gz
       |
       +- PACKAGES
       |
       +- jsonAccess_0.1.1.tar.gz
       |
       +- expansions_1.2.tar.gz
       |
       `- dir2json_2.1.0.tar.gz
    > cat(dir2tree("src", vertical=TRUE))
                                                src                                             
                                                 |                                              
                                              contrib                                           
                                                 |                                              
          ---------------------------------------------------------------------------           
         /          |                 |                       |                      \          
    PACKAGES.gz  PACKAGES  jsonAccess_0.1.1.tar.gz  expansions_1.2.tar.gz  dir2json_2.1.0.tar.gz
    

    The package also contains a Shiny application which generates an interactive Reingold-Tilford tree representation of a folder:

    > dir2json::shinyDirTree(".")
    

    0 讨论(0)
提交回复
热议问题