mapcat breaking the lazyness

浪尽此生 提交于 2019-11-30 19:08:06

Lazy-sequence production and consumption is different than lazy evaluation.

Clojure functions do strict/eager evaluation of their arguments. Evaluation of an argument that is or that yields a lazy sequence does not force realization of the yielded lazy sequence in and of itself. However, any side effects caused by evaluation of the argument will occur.

The ordinary use case for mapcat is to concatenate sequences yielded without side effects. Therefore, it hardly matters that some of the arguments are eagerly evaluated because no side effects are expected.

Your function my-mapcat imposes additional laziness on the evaluation of its arguments by wrapping them in thunks (other lazy-seqs). This can be useful when significant side effects - IO, significant memory consumption, state updates - are expected. However, the warning bells should probably be going off in your head if your function is doing side effects and producing a sequence to be concatenated that your code probably needs refactoring.

Here is similar from algo.monads

(defn- flatten*
  "Like #(apply concat %), but fully lazy: it evaluates each sublist
   only when it is needed."
  [ss]
  (lazy-seq
    (when-let [s (seq ss)]
      (concat (first s) (flatten* (rest s))))))

Another way to write my-mapcat:

(defn my-mapcat [f coll] (for [x coll, fx (f x)] fx))

Applying a function to a lazy sequence will force realization of a portion of that lazy sequence necessary to satisfy the arguments of the function. If that function itself produces lazy sequences as a result, those are not realized as a matter of course.

Consider this function to count the realized portion of a sequence

(defn count-realized [s] 
  (loop [s s, n 0] 
    (if (instance? clojure.lang.IPending s)
      (if (and (realized? s) (seq s))
        (recur (rest s) (inc n))
        n)
      (if (seq s)
        (recur (rest s) (inc n))
        n))))

Now let's see what's being realized

(let [seq-of-seqs (map range (list 1 2 3 4 5 6))
      concat-seq (apply concat seq-of-seqs)]
  (println "seq-of-seqs: " (count-realized seq-of-seqs))
  (println "concat-seq: " (count-realized concat-seq))
  (println "seqs-in-seq: " (mapv count-realized seq-of-seqs)))          

 ;=> seq-of-seqs:  4
 ;   concat-seq:  0
 ;   seqs-in-seq:  [0 0 0 0 0 0]

So, 4 elements of the seq-of-seqs got realized, but none of its component sequences were realized nor was there any realization in the concatenated sequence.

Why 4? Because the applicable arity overloaded version of concat takes 4 arguments [x y & xs] (count the &).

Compare to

(let [seq-of-seqs (map range (list 1 2 3 4 5 6))
      foo-seq (apply (fn foo [& more] more) seq-of-seqs)]
  (println "seq-of-seqs: " (count-realized seq-of-seqs))
  (println "seqs-in-seq: " (mapv count-realized seq-of-seqs)))

;=> seq-of-seqs:  2
;   seqs-in-seq:  [0 0 0 0 0 0]

(let [seq-of-seqs (map range (list 1 2 3 4 5 6))
      foo-seq (apply (fn foo [a b c & more] more) seq-of-seqs)]
  (println "seq-of-seqs: " (count-realized seq-of-seqs))
  (println "seqs-in-seq: " (mapv count-realized seq-of-seqs)))

;=> seq-of-seqs:  5
;   seqs-in-seq:  [0 0 0 0 0 0]

Clojure has two solutions to making the evaluation of arguments lazy.

One is macros. Unlike functions, macros do not evaluate their arguments.

Here's a function with a side effect

(defn f [n] (println "foo!") (repeat n n))

Side effects are produced even though the sequence is not realized

user=> (def x (concat (f 1) (f 2)))
foo!
foo!
#'user/x
user=> (count-realized x)
0

Clojure has a lazy-cat macro to prevent this

user=> (def y (lazy-cat (f 1) (f 2)))
#'user/y
user=> (count-realized y)
0
user=> (dorun y)
foo!
foo!
nil
user=> (count-realized y)
3
user=> y
(1 2 2)

Unfortunately, you cannot apply a macro.

The other solution to delay evaluation is wrap in thunks, which is exactly what you've done.

Your premise is wrong. Concat is lazy, apply is lazy if its first argument is, and mapcat is lazy.

user> (class (mapcat (fn [x y] (println x y) (list x y)) (range) (range)))
0 0
1 1
2 2
3 3
clojure.lang.LazySeq

note that some of the initial values are evaluated (more on this below), but clearly the whole thing is still lazy (or the call would never have returned, (range) returns an endless sequence, and will not return when used eagerly).

The blog you link to is about the danger of recursively using mapcat on a lazy tree, because it is eager on the first few elements (which can add up in a recursive application).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!