How to obtain paths to all the child nodes in a tree that only have leaves using clojure zippers?

问题

Say I have a tree like this. I would like to obtain the paths to child nodes that only contain leaves and not non-leaf child nodes.

So for this tree

root
├──leaf123
├──level_a_node1
│   ├──leaf456
├──level_a_node2
│  ├──level_b_node1
│  │  └──leaf987
│  └──level_b_node2
│     └──level_c_node1
|        └── leaf654
├──leaf789
└──level_a_node3
   └──leaf432

The result would be

[["root"  "level_a_node1"]
["root"  "level_a_node2" "level_b_node1"]
["root"  "level_a_node2" "level_b_node2" "level_c_node1"]
["root"  "level_a_node3"]]

I've attempted to go down to the bottom nodes and check if the (lefts) and the (rights) are not branches, but that that doesn't quite work.

(z/vector-zip ["root"
               ["level_a_node3" ["leaf432"]]
               ["level_a_node2" ["level_b_node2" ["level_c_node1" ["leaf654"]]] ["level_b_node1" ["leaf987"]] ["leaf789"]]
               ["level_a_node1" ["leaf456"]]
               ["leaf123"]])

edit: my data is actually coming in as a list of paths and I'm converting that into a tree. But maybe that is an overcomplication?

[["root" "leaf"]
["root"  "level_a_node1" "leaf"]
["root"  "level_a_node2" "leaf"]
["root"  "level_a_node2" "level_b_node1" "leaf"]
["root"  "level_a_node2" "level_b_node2" "level_c_node1" "leaf"]
["root"  "level_a_node3" "leaf"]]

回答1:

Hiccup-style structures are a nice place to visit, but I wouldn't want to live there. That is, they're very succinct to write, but a giant pain to manipulate programmatically, because the semantic nesting structure is not reflected in the physical structure of the nodes. So, the first thing I would do is convert to Enlive-style tree representation (or, ideally, generate Enlive to begin with):

(def hiccup
  ["root"
   ["level_a_node3" ["leaf432"]]
   ["level_a_node2"
    ["level_b_node2"
     ["level_c_node1"
      ["leaf654"]]]
    ["level_b_node1"
     ["leaf987"]]
    ["leaf789"]]
   ["level_a_node1"
    ["leaf456"]]
   ["leaf123"]])
(defn hiccup->enlive [x]
  (when (vector? x)
    {:tag (first x)
     :content (map hiccup->enlive (rest x))}))
(def enlive (hiccup->enlive hiccup))

;; Yielding...
{:tag "root",
 :content
 ({:tag "level_a_node3", :content ({:tag "leaf432", :content ()})}
  {:tag "level_a_node2",
   :content
   ({:tag "level_b_node2",
     :content
     ({:tag "level_c_node1",
       :content ({:tag "leaf654", :content ()})})}
    {:tag "level_b_node1", :content ({:tag "leaf987", :content ()})}
    {:tag "leaf789", :content ()})}
  {:tag "level_a_node1", :content ({:tag "leaf456", :content ()})}
  {:tag "leaf123", :content ()})}

Having done this, the last thing getting in your way is your desire to use zippers. They are a good tool for targeted traversals, where you care a lot about the structure near the node you are working on. But if all you care about is the node and its children, it is much easier to just write a simple recursive function to traverse the tree:

(defn paths-to-leaves [{:keys [tag content] :as root}]
  (when (seq content)
    (if (every? #(empty? (:content %)) content)
      [(list tag)]
      (for [child content
            path (paths-to-leaves child)]
        (cons tag path)))))

The ability to write recursive traversals like this is a skill that will serve you many times throughout your Clojure career (for example, a similar question I recently answered on Code Review). It turns out that a huge number of functions on trees are just: call yourself recursively on each child, and somehow combine the results, usually in a possibly-nested for loop. The hard part is just figuring out what your base case needs to be, and the correct sequence of maps/mapcats to combine the results without introducing undesired levels of nesting.

If you insist on sticking with Hiccup, you can de-mangle it at the use site without too much pain:

(defn hiccup-paths-to-leaves [node]
  (when (vector? node)
    (let [tag (first node), content (next node)]
      (if (and content (every? #(= 1 (count %)) content))
        [(list tag)]
        (for [child content
              path (hiccup-paths-to-leaves child)]
          (cons tag path))))))

But it's noticeably messier, and is work you'll have to repeat every time you work with a tree. Again I encourage you to use Enlive-style trees for your internal data representation.

回答2:

You can definitely use the file api to navigate the directory. If using zipper, you can do this:

(loop [loc (vector-zip ["root"
                        ["level_a_node3"
                         ["leaf432"]]
                        ["level_a_node2"
                         ["level_b_node2"
                          ["level_c_node1"
                           ["leaf654"]]]
                         ["level_b_node1"
                          ["leaf987"]]
                         ["leaf789"]]
                        ["level_a_node1"
                         ["leaf456" "leaf456b"]]
                        ["leaf123"]])
       ans nil]
  (if (end? loc)
    ans
    (recur (next loc)
           (cond->> ans
             (contains-leaves-only? loc)
             (cons (->> loc down path (map node)))))))

which will output this:

(("root" "level_a_node1")
 ("root" "level_a_node2" "level_b_node1")
 ("root" "level_a_node2" "level_b_node2" "level_c_node1")
 ("root" "level_a_node3"))

with the way you define the tree, helper functions can be implemented as:

(def is-leaf? #(-> % down nil?))

(defn contains-leaves-only?
  [loc]
  (some->> loc
           down            ;; branch name
           right           ;; children list
           down            ;; first child
           (iterate right) ;; with other sibiling
           (take-while identity)
           (every? is-leaf?)))

UPDATE - add a lazy sequence version

(->> ["root"
      ["level_a_node3"
      ["leaf432"]]
      ["level_a_node2"
      ["level_b_node2"
        ["level_c_node1"
        ["leaf654"]]]
      ["level_b_node1"
        ["leaf987"]]
      ["leaf789"]]
      ["level_a_node1"
      ["leaf456" "leaf456b"]]
      ["leaf123"]]
     vector-zip
     (iterate next)
     (take-while (complement end?))
     (filter contains-leaves-only?)
     (map #(->> % down path (map node))))

回答3:

It is because zippers have so many limitations that I created the Tupelo Forest library for processing tree-like data structures. Your problem then has a simple solution:

(ns tst.tupelo.forest-examples
  (:use tupelo.core tupelo.forest tupelo.test))

  (with-forest (new-forest)
    (let [data          ["root"
                         ["level_a_node3" ["leaf"]]
                         ["level_a_node2"
                          ["level_b_node2"
                           ["level_c_node1"
                            ["leaf"]]]
                          ["level_b_node1" ["leaf"]]]
                         ["level_a_node1" ["leaf"]]
                         ["leaf"]]
          root-hid      (add-tree-hiccup data)
          leaf-paths    (find-paths-with root-hid [:** :*] leaf-path?)]

with a tree that looks like:

(hid->bush root-hid) => 
    [{:tag "root"}
     [{:tag "level_a_node3"}
      [{:tag "leaf"}]]
     [{:tag "level_a_node2"}
      [{:tag "level_b_node2"}
       [{:tag "level_c_node1"}
        [{:tag "leaf"}]]]
      [{:tag "level_b_node1"}
       [{:tag "leaf"}]]]
     [{:tag "level_a_node1"}
      [{:tag "leaf"}]]
     [{:tag "leaf"}]])

and a result like:

(format-paths leaf-paths) => 
    [[{:tag "root"} [{:tag "level_a_node3"} [{:tag "leaf"}]]]
     [{:tag "root"} [{:tag "level_a_node2"} [{:tag "level_b_node2"} [{:tag "level_c_node1"} [{:tag "leaf"}]]]]]
     [{:tag "root"} [{:tag "level_a_node2"} [{:tag "level_b_node1"} [{:tag "leaf"}]]]]
     [{:tag "root"} [{:tag "level_a_node1"} [{:tag "leaf"}]]]
     [{:tag "root"} [{:tag "leaf"}]]]))))

There are many choices after this depending on the next steps in the processing chain.

来源：https://stackoverflow.com/questions/56030511/how-to-obtain-paths-to-all-the-child-nodes-in-a-tree-that-only-have-leaves-using

标签

clojure

tree

zipper