I have the following XML structure:
(def xmlstr
\"
A AA
The Tupelo library can easily solve this problem using tupelo.forest
. You can find the API docs on GitHub Pages. Below is a test case using your example.
Here we load your xml data and convert it first into enlive and then the native tree
structure used by tupelo.forest
:
(ns tst.tupelo.forest-examples
(:use tupelo.forest tupelo.test )
(:require
[clojure.data.xml :as dx]
[clojure.java.io :as io]
[clojure.set :as cs]
[net.cgrand.enlive-html :as en-html]
[schema.core :as s]
[tupelo.core :as t]
[tupelo.string :as ts]))
(t/refer-tupelo)
; Discard any xml nodes of Type="A" or Type="B" (plus blank string nodes)
(dotest
(with-forest (new-forest)
(let [xml-str "
A AA1
B BB1
C CC1
A AA2
"
enlive-tree (->> xml-str
java.io.StringReader.
en-html/html-resource
first)
root-hid (add-tree-enlive enlive-tree)
tree-1 (hid->tree root-hid)
The hid
suffix stands for "Hex ID" which is unique hex value that acts like a pointer to a node/leaf in the tree. At this stage we have just loaded the data in the forest data structure, creating tree-1
which looks like:
(is= tree-1
{:attrs {:tag :ROOT},
:kids [{:attrs {:tag :tupelo.forest/raw},
:value "\n "}
{:attrs {:tag :Items},
:kids [{:attrs {:tag :tupelo.forest/raw},
:value "\n "}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "A"}
{:attrs {:tag :Note}, :value "AA1"}]}
{:attrs {:tag :tupelo.forest/raw},
:value "\n "}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "B"}
{:attrs {:tag :Note}, :value "BB1"}]}
{:attrs {:tag :tupelo.forest/raw},
:value "\n "}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "C"}
{:attrs {:tag :Note}, :value "CC1"}]}
{:attrs {:tag :tupelo.forest/raw},
:value "\n "}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "A"}
{:attrs {:tag :Note}, :value "AA2"}]}
{:attrs {:tag :tupelo.forest/raw},
:value "\n "}]}
{:attrs {:tag :tupelo.forest/raw},
:value "\n "}]})
We next remove any blank strings with this code:
blank-leaf-hid? (fn [hid] (and (leaf-hid? hid) ; ensure it is a leaf node
(let [value (hid->value hid)]
(and (string? value)
(or (zero? (count value)) ; empty string
(ts/whitespace? value)))))) ; all whitespace string
blank-leaf-hids (keep-if blank-leaf-hid? (all-hids))
>> (apply remove-hid blank-leaf-hids)
tree-2 (hid->tree root-hid)
yielding tree-2
which looks much neater:
(is= tree-2
{:attrs {:tag :ROOT},
:kids [{:attrs {:tag :Items},
:kids [{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "A"}
{:attrs {:tag :Note}, :value "AA1"}]}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "B"}
{:attrs {:tag :Note}, :value "BB1"}]}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "C"}
{:attrs {:tag :Note}, :value "CC1"}]}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "A"}
{:attrs {:tag :Note}, :value "AA2"}]}]}]})
The final code fragment removes Type="B" or Type="C" nodes:
type-bc-hid? (fn [hid] (pos? (count (glue
(find-leaf-hids hid [:** :Type] "B")
(find-leaf-hids hid [:** :Type] "C")))))
type-bc-hids (find-hids-with root-hid [:** :Item] type-bc-hid?)
>> (apply remove-hid type-bc-hids)
tree-3 (hid->tree root-hid)
tree-3-hiccup (hid->hiccup root-hid) ]
yielding the final result tree shown in both tree
format and hiccup
format:
(is= tree-3
{:attrs {:tag :ROOT},
:kids
[{:attrs {:tag :Items},
:kids [{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "A"}
{:attrs {:tag :Note}, :value "AA1"}]}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "A"}
{:attrs {:tag :Note}, :value "AA2"}]}]}]})
(is= tree-3-hiccup
[:ROOT
[:Items
[:Item [:Type "A"] [:Note "AA1"]]
[:Item [:Type "A"] [:Note "AA2"]]]]))))
The full example can be found in the forest-examples unit test.
Here is the most compact version with extra features removed:
(dotest
(with-forest (new-forest)
(let [xml-str "
A AA1
B BB1
C CC1
A AA2
"
enlive-tree (->> xml-str
java.io.StringReader.
en-html/xml-resource
first)
root-hid (add-tree-enlive enlive-tree)
blank-leaf-hid? (fn [hid] (ts/whitespace? (hid->value hid)))
has-bc-leaf? (fn [hid] (or (has-child-leaf? hid [:** :Type] "B")
(has-child-leaf? hid [:** :Type] "C")))
blank-leaf-hids (keep-if blank-leaf-hid? (all-leaf-hids))
>> (apply remove-hid blank-leaf-hids)
bc-item-hids (find-hids-with root-hid [:** :Item] has-bc-leaf?)]
(apply remove-hid bc-item-hids)
(is= (hid->hiccup root-hid)
[:ROOT
[:Items
[:Item [:Type "A"] [:Note "AA1"]]
[:Item [:Type "A"] [:Note "AA2"]]]]))))