mapcat breaking the lazyness

I have a function that produces lazy-sequences called a-function.

If I run the code:

(map a-function a-sequence-of-values)

it returns a lazy sequence as expected.

But when I run the code:

(mapcat a-function a-sequence-of-values)

it breaks the lazyness of my function. In fact it turns that code into

(apply concat (map a-function a-sequence-of-values))

So it needs to realize all the values from the map before concatenating those values.

What I need is a function that concatenates the result of a map function on demand without realizing all the map beforehand.

I can hack a function for this:

(defn my-mapcat
  [f coll]
  (lazy-seq
   (if (not-empty coll)
     (concat
      (f (first coll))
      (my-mapcat f (rest coll))))))

But I can't believe that clojure doesn't have something already done. Do you know if clojure has such feature? Only a few people and I have the same problem?

I also found a blog that deals with the same issue: http://clojurian.blogspot.com.br/2012/11/beware-of-mapcat.html

Lazy-sequence production and consumption is different than lazy evaluation.

Clojure functions do strict/eager evaluation of their arguments. Evaluation of an argument that is or that yields a lazy sequence does not force realization of the yielded lazy sequence in and of itself. However, any side effects caused by evaluation of the argument will occur.

The ordinary use case for mapcat is to concatenate sequences yielded without side effects. Therefore, it hardly matters that some of the arguments are eagerly evaluated because no side effects are expected.

Your function my-mapcat imposes additional laziness on the evaluation of its arguments by wrapping them in thunks (other lazy-seqs). This can be useful when significant side effects - IO, significant memory consumption, state updates - are expected. However, the warning bells should probably be going off in your head if your function is doing side effects and producing a sequence to be concatenated that your code probably needs refactoring.

Here is similar from algo.monads

(defn- flatten*
  "Like #(apply concat %), but fully lazy: it evaluates each sublist
   only when it is needed."
  [ss]
  (lazy-seq
    (when-let [s (seq ss)]
      (concat (first s) (flatten* (rest s))))))

Another way to write my-mapcat:

(defn my-mapcat [f coll] (for [x coll, fx (f x)] fx))

Applying a function to a lazy sequence will force realization of a portion of that lazy sequence necessary to satisfy the arguments of the function. If that function itself produces lazy sequences as a result, those are not realized as a matter of course.

Consider this function to count the realized portion of a sequence

(defn count-realized [s] 
  (loop [s s, n 0] 
    (if (instance? clojure.lang.IPending s)
      (if (and (realized? s) (seq s))
        (recur (rest s) (inc n))
        n)
      (if (seq s)
        (recur (rest s) (inc n))
        n))))

Now let's see what's being realized

(let [seq-of-seqs (map range (list 1 2 3 4 5 6))
      concat-seq (apply concat seq-of-seqs)]
  (println "seq-of-seqs: " (count-realized seq-of-seqs))
  (println "concat-seq: " (count-realized concat-seq))
  (println "seqs-in-seq: " (mapv count-realized seq-of-seqs)))          

 ;=> seq-of-seqs:  4
 ;   concat-seq:  0
 ;   seqs-in-seq:  [0 0 0 0 0 0]

So, 4 elements of the seq-of-seqs got realized, but none of its component sequences were realized nor was there any realization in the concatenated sequence.

Why 4? Because the applicable arity overloaded version of concat takes 4 arguments [x y & xs] (count the &).

Compare to

(let [seq-of-seqs (map range (list 1 2 3 4 5 6))
      foo-seq (apply (fn foo [& more] more) seq-of-seqs)]
  (println "seq-of-seqs: " (count-realized seq-of-seqs))
  (println "seqs-in-seq: " (mapv count-realized seq-of-seqs)))

;=> seq-of-seqs:  2
;   seqs-in-seq:  [0 0 0 0 0 0]

(let [seq-of-seqs (map range (list 1 2 3 4 5 6))
      foo-seq (apply (fn foo [a b c & more] more) seq-of-seqs)]
  (println "seq-of-seqs: " (count-realized seq-of-seqs))
  (println "seqs-in-seq: " (mapv count-realized seq-of-seqs)))

;=> seq-of-seqs:  5
;   seqs-in-seq:  [0 0 0 0 0 0]

Clojure has two solutions to making the evaluation of arguments lazy.

One is macros. Unlike functions, macros do not evaluate their arguments.

Here's a function with a side effect

(defn f [n] (println "foo!") (repeat n n))

Side effects are produced even though the sequence is not realized

user=> (def x (concat (f 1) (f 2)))
foo!
foo!
#'user/x
user=> (count-realized x)
0

Clojure has a lazy-cat macro to prevent this

user=> (def y (lazy-cat (f 1) (f 2)))
#'user/y
user=> (count-realized y)
0
user=> (dorun y)
foo!
foo!
nil
user=> (count-realized y)
3
user=> y
(1 2 2)

Unfortunately, you cannot apply a macro.

The other solution to delay evaluation is wrap in thunks, which is exactly what you've done.

Your premise is wrong. Concat is lazy, apply is lazy if its first argument is, and mapcat is lazy.

user> (class (mapcat (fn [x y] (println x y) (list x y)) (range) (range)))
0 0
1 1
2 2
3 3
clojure.lang.LazySeq

note that some of the initial values are evaluated (more on this below), but clearly the whole thing is still lazy (or the call would never have returned, (range) returns an endless sequence, and will not return when used eagerly).

The blog you link to is about the danger of recursively using mapcat on a lazy tree, because it is eager on the first few elements (which can add up in a recursive application).

来源：https://stackoverflow.com/questions/21943577/mapcat-breaking-the-lazyness

标签

clojure

concat

lazy-sequences