I need to very efficiently compare two maps in Clojure/Java, and return the difference as determined by Java\'s .equals(..), with nil/null equivalent to \"not present\".
You could also just use Maps.difference(..) method from Google's Guava libraries
I'm not sure what the absolutely most efficient way to do this is, but here's a couple of things which may be useful:
The basic expectation of behaviour from the question text is impossible: if a
and b
are maps such that b
contains at least one key not present in a
, (merge b <sth>)
cannot be equal to a
.
If you end up going with an interop solution but then need to go back to a PersistentHashMap
at some point, there's always
(clojure.lang.PersistentHashMap/create
(doto (java.util.HashMap.)
(.put :foo 1)
(.put :bar 2)))
; => {:foo 1 :bar 2}
If you need to pass the keyset of a Clojure map to a Java method, you can use
(.keySet {:foo 1 :bar 2})
; => #< [:foo, :bar]>
If all keys involved are guaranteed to be Comparable
, this could be exploited for efficient computation of difference
on maps with many keys (sort & merge scan). For unconstrained keys this is of course a no-go and for small maps it could actually hurt performance.
It's good to have a version written in Clojure, if only to set a baseline performance expectation. Here is one: (updated)
(defn map-difference [m1 m2]
(loop [m (transient {})
ks (concat (keys m1) (keys m2))]
(if-let [k (first ks)]
(let [e1 (find m1 k)
e2 (find m2 k)]
(cond (and e1 e2 (not= (e1 1) (e2 1))) (recur (assoc! m k (e1 1)) (next ks))
(not e1) (recur (assoc! m k (e2 1)) (next ks))
(not e2) (recur (assoc! m k (e1 1)) (next ks))
:else (recur m (next ks))))
(persistent! m))))
I think that just doing (concat (keys m1) (keys m2))
and possibly duplicating some work is likely more efficient most of the time than checking a given key is in "the other map" too at every step.
To wrap up the answer, here's a very simple-minded set-based version with the nice property that it says what it does -- if I misunderstood the spec, it should be readily apparent here. :-)
(defn map-difference [m1 m2]
(let [ks1 (set (keys m1))
ks2 (set (keys m2))
ks1-ks2 (set/difference ks1 ks2)
ks2-ks1 (set/difference ks2 ks1)
ks1*ks2 (set/intersection ks1 ks2)]
(merge (select-keys m1 ks1-ks2)
(select-keys m2 ks2-ks1)
(select-keys m1
(remove (fn [k] (= (m1 k) (m2 k)))
ks1*ks2)))))
I am not sure about its performance
(defn map-difference
[orig other]
(let [changed (set/difference (set orig) (set other))
added (set/difference (set (keys other)) (set (keys orig)))]
(reduce (fn [acc key]
(assoc acc key :missing))
(into {} changed)
added)))
I used :missing
key to avoid ambiguity between a nil
value in the original map, and a missing key -- you can of course change it to nil
.
What about...
(defn map-diff [m1 m2]
;; m1: hashmap
;; m2: hashmap
;; => the difference between them
(reduce merge
(map #(hash-map % (- (or (% m1) 0) (or (% m2) 0)))
(keys (merge m1 m2)))))
Clojure maps will be fine because reading clojure maps is very fast.
I can't answer you but I can give you something to look at. Brenton Ashworth made a testtool where he solved the problem with map compares. Maybe you can look at his code to get hint for implementation. http://formpluslogic.blogspot.com/2010/07/better-clojure-test-results-with-deview.html and http://github.com/brentonashworth/deview
I don't think there is a better implementation that compare the keys and look up if the are different.
In Java, Google Commons Collections offer a quite performant solution.