Remove specific XML nodes using Clojure

后端 未结 3 606
悲&欢浪女
悲&欢浪女 2021-01-24 07:45

I have the following XML structure:

(def xmlstr
\"
  
    AAA         


        
3条回答
  •  时光说笑
    2021-01-24 08:23

    The Tupelo library can easily solve this problem using tupelo.forest. You can find the API docs on GitHub Pages. Below is a test case using your example.

    Here we load your xml data and convert it first into enlive and then the native tree structure used by tupelo.forest:

    (ns tst.tupelo.forest-examples
      (:use tupelo.forest tupelo.test )
      (:require
        [clojure.data.xml :as dx]
        [clojure.java.io :as io]
        [clojure.set :as cs]
        [net.cgrand.enlive-html :as en-html]
        [schema.core :as s]
        [tupelo.core :as t]
        [tupelo.string :as ts]))
    (t/refer-tupelo)
    
    ; Discard any xml nodes of Type="A" or Type="B" (plus blank string nodes)
    (dotest
      (with-forest (new-forest)
        (let [xml-str         "
                                
                                  AAA1
                                  BBB1
                                  CCC1
                                  AAA2
                                
                              "
              enlive-tree     (->> xml-str
                                java.io.StringReader.
                                en-html/html-resource
                                first)
              root-hid        (add-tree-enlive enlive-tree)
              tree-1          (hid->tree root-hid)
    

    The hid suffix stands for "Hex ID" which is unique hex value that acts like a pointer to a node/leaf in the tree. At this stage we have just loaded the data in the forest data structure, creating tree-1 which looks like:

     (is= tree-1
       {:attrs {:tag :ROOT},
        :kids  [{:attrs {:tag :tupelo.forest/raw},
                 :value "\n                            "}
                {:attrs {:tag :Items},
                 :kids  [{:attrs {:tag :tupelo.forest/raw},
                          :value "\n                              "}
                         {:attrs {:tag :Item},
                          :kids  [{:attrs {:tag :Type}, :value "A"}
                                  {:attrs {:tag :Note}, :value "AA1"}]}
                         {:attrs {:tag :tupelo.forest/raw},
                          :value "\n                              "}
                         {:attrs {:tag :Item},
                          :kids  [{:attrs {:tag :Type}, :value "B"}
                                  {:attrs {:tag :Note}, :value "BB1"}]}
                         {:attrs {:tag :tupelo.forest/raw},
                          :value "\n                              "}
                         {:attrs {:tag :Item},
                          :kids  [{:attrs {:tag :Type}, :value "C"}
                                  {:attrs {:tag :Note}, :value "CC1"}]}
                         {:attrs {:tag :tupelo.forest/raw},
                          :value "\n                              "}
                         {:attrs {:tag :Item},
                          :kids  [{:attrs {:tag :Type}, :value "A"}
                                  {:attrs {:tag :Note}, :value "AA2"}]}
                         {:attrs {:tag :tupelo.forest/raw},
                          :value "\n                            "}]}
                {:attrs {:tag :tupelo.forest/raw},
                 :value "\n                          "}]})
    

    We next remove any blank strings with this code:

    blank-leaf-hid? (fn [hid] (and (leaf-hid? hid) ; ensure it is a leaf node
                                (let [value (hid->value hid)]
                                  (and (string? value)
                                    (or (zero? (count value)) ; empty string
                                      (ts/whitespace? value)))))) ; all whitespace string
    
    blank-leaf-hids (keep-if blank-leaf-hid? (all-hids))
    >>              (apply remove-hid blank-leaf-hids)
    tree-2          (hid->tree root-hid)
    

    yielding tree-2 which looks much neater:

    (is= tree-2
      {:attrs {:tag :ROOT},
       :kids  [{:attrs {:tag :Items},
                :kids  [{:attrs {:tag :Item},
                         :kids  [{:attrs {:tag :Type}, :value "A"}
                                 {:attrs {:tag :Note}, :value "AA1"}]}
                        {:attrs {:tag :Item},
                         :kids  [{:attrs {:tag :Type}, :value "B"}
                                 {:attrs {:tag :Note}, :value "BB1"}]}
                        {:attrs {:tag :Item},
                         :kids  [{:attrs {:tag :Type}, :value "C"}
                                 {:attrs {:tag :Note}, :value "CC1"}]}
                        {:attrs {:tag :Item},
                         :kids  [{:attrs {:tag :Type}, :value "A"}
                                 {:attrs {:tag :Note}, :value "AA2"}]}]}]})
    

    The final code fragment removes Type="B" or Type="C" nodes:

    type-bc-hid?    (fn [hid] (pos? (count (glue
                                (find-leaf-hids hid [:** :Type] "B")
                                (find-leaf-hids hid [:** :Type] "C")))))
    
    type-bc-hids    (find-hids-with root-hid [:** :Item] type-bc-hid?)
    >>              (apply remove-hid type-bc-hids)
    tree-3          (hid->tree root-hid)
    tree-3-hiccup   (hid->hiccup root-hid) ]
    

    yielding the final result tree shown in both tree format and hiccup format:

    (is= tree-3
      {:attrs {:tag :ROOT},
       :kids
              [{:attrs {:tag :Items},
                :kids  [{:attrs {:tag :Item},
                         :kids  [{:attrs {:tag :Type}, :value "A"}
                                 {:attrs {:tag :Note}, :value "AA1"}]}
                        {:attrs {:tag :Item},
                         :kids  [{:attrs {:tag :Type}, :value "A"}
                                 {:attrs {:tag :Note}, :value "AA2"}]}]}]})
    (is= tree-3-hiccup
      [:ROOT
       [:Items
        [:Item [:Type "A"] [:Note "AA1"]]
        [:Item [:Type "A"] [:Note "AA2"]]]]))))
    

    The full example can be found in the forest-examples unit test.

    Update

    Here is the most compact version with extra features removed:

    (dotest
      (with-forest (new-forest)
        (let [xml-str         "
                                
                                  AAA1
                                  BBB1
                                  CCC1
                                  AAA2
                                
                              "
              enlive-tree     (->> xml-str
                                java.io.StringReader.
                                en-html/xml-resource
                                first)
              root-hid        (add-tree-enlive enlive-tree)
              blank-leaf-hid? (fn [hid] (ts/whitespace? (hid->value hid)))
              has-bc-leaf?    (fn [hid] (or (has-child-leaf? hid [:** :Type] "B")
                                            (has-child-leaf? hid [:** :Type] "C")))
              blank-leaf-hids (keep-if blank-leaf-hid? (all-leaf-hids))
              >>              (apply remove-hid blank-leaf-hids)
              bc-item-hids    (find-hids-with root-hid [:** :Item] has-bc-leaf?)]
          (apply remove-hid bc-item-hids)
          (is= (hid->hiccup root-hid)
            [:ROOT
             [:Items
              [:Item [:Type "A"] [:Note "AA1"]]
              [:Item [:Type "A"] [:Note "AA2"]]]]))))
    

提交回复
热议问题