Why inserting 1000 000 values in a transient map in Clojure yields a map with 8 items in it?

后端 未结 2 711
一生所求
一生所求 2020-12-11 02:29

If I try to do 1000 000 assoc! on a transient vector, I\'ll get a vector of 1000 000 elements

(count
  (let [m (transient [])]
    (dotimes [i 1         


        
相关标签:
2条回答
  • 2020-12-11 03:10

    The simplest explanation is from the Clojure documentation itself (emphasis mine):

    Transients support a parallel set of 'changing' operations, with similar names followed by ! - assoc!, conj! etc. These do the same things as their persistent counterparts except the return values are themselves transient. Note in particular that transients are not designed to be bashed in-place. You must capture and use the return value in the next call.

    0 讨论(0)
  • 2020-12-11 03:20

    The transient datatypes' operations don't guarantee that they will return the same reference as the one passed in. Sometimes the implementation might decide to return a new (but still transient) map after an assoc! rather than using the one you passed in.

    The ClojureDocs page on assoc! has a nice example that explains this behavior:

    ;; The key concept to understand here is that transients are 
    ;; not meant to be `bashed in place`; always use the value 
    ;; returned by either assoc! or other functions that operate
    ;; on transients.
    
    (defn merge2
      "An example implementation of `merge` using transients."
      [x y]
      (persistent! (reduce
                    (fn [res [k v]] (assoc! res k v))
                    (transient x)
                    y)))
    
    ;; Why always use the return value, and not the original?  Because the return
    ;; value might be a different object than the original.  The implementation
    ;; of Clojure transients in some cases changes the internal representation
    ;; of a transient collection (e.g. when it reaches a certain size).  In such
    ;; cases, if you continue to try modifying the original object, the results
    ;; will be incorrect.
    
    ;; Think of transients like persistent collections in how you write code to
    ;; update them, except unlike persistent collections, the original collection
    ;; you passed in should be treated as having an undefined value.  Only the return
    ;; value is predictable.
    

    I'd like to repeat that last part because it's very important: the original collection you passed in should be treated as having an undefined value. Only the return value is predictable.

    Here's a modified version of your code that works as expected:

    (count
      (let [m (transient {})]
        (persistent!
          (reduce (fn [acc i] (assoc! acc i i))
                  m (range 1000000)))))
    

    As a side note, the reason you always get 8 is because Clojure likes to use a clojure.lang.PersistentArrayMap (a map backed by an array) for maps with 8 or fewer elements. Once you get past 8, it switches to clojure.lang.PersistentHashMap.

    user=> (type '{1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a})
    clojure.lang.PersistentArrayMap
    user=> (type '{1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a})
    clojure.lang.PersistentHashMap
    

    Once you get past 8 entries, your transient map switches the backing data structure from an array of pairs (PersistentArrayMap) to a hashtable (PersistentHashMap), at which point assoc! returns a new reference instead of just updating the old one.

    0 讨论(0)
提交回复
热议问题