Walk through a list split function in Haskell

问题

This is a follow up to my previous question.

I am trying to understand the list splitting example in Haskell from here:

foldr (\a ~(x,y) -> (a:y,x)) ([],[])

I can read Haskell and know what foldr is but don't understand this code. Could you walk me through this code and explain it in more details ?

回答1:

Let’s try running this function on a sample input list, say [1,2,3,4,5]:

We start with foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [1,2,3,4,5]. Here a is the first element of the list, and (x,y) start out as ([],[]), so (a:y,x) returns ([1],[]).
The next element of the input list is a = 2, and (x,y) = ([1],[]), so (a:y,x) = ([2],[1]). Note that the order of the lists has swapped. Each iteration will swap the lists again; however, the next element of the input list will always be added to the first list, which is how the splitting works.
The next element of the input list is a = 3, and (x,y) = ([2],[1]), so (a:y,x) = ([3,1],[2]).
The next element of the input list is a = 4, and (x,y) = ([3,1],[2]), so (a:y,x) = ([4,2],[3,1]).
The next element of the input list is a = 4, and (x,y) = ([4,2],[3,1]), so (a:y,x) = ([5,3,1],[4,2]).
There are no more elements left, so the return value is ([5,3,1],[4,2]).

As the walkthrough shows, the split function works by maintaining two lists, swapping them on each iteration, and appending each element of the input to a different list.

回答2:

We can take a look at an example. For example if we have a list [1, 4, 2, 5]. If we thus process the list, then we see that foldr will be calculated as:

foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [1,4,2,5]

So here a is first the first item of the list, and then it will tus return something like:

(1:y, x)
    where (x, y) = foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [4,2,5]

Notice that here the (x, y) tuple is swapped when we prepend a to the first item of the 2-tuple.

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [2,5]

and if we keep doing that, we thus obtain:

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = (2:y'', x'')
          (x'', y'') = (5:y''', x''')
          (x''', y''') = foldr (\a ~(x,y) -> (a:y,x)) ([],[]) []

Since we reached the end of the list, we thus obtain for the foldr … ([], []) [], the 2-tuple ([], []):

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = (2:y'', x'')
          (x'', y'') = (5:y''', x''')
          (x''', y''') = ([],[])

So x''' = [] and y''' = [], so thus this is resolved to:

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = (2:y'', x'')
          (x'', y'') = (5:[], [])
          (x''', y''') = ([],[])

so x'' = [5] and y'' = []:

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = (2:[], [5])
          (x'', y'') = (5:[], [])
          (x''', y''') = ([],[])

so x' = [5] and y' = [2]:

(1:y, x)
    where (x, y) = (4:[5], [2])
          (x', y') = (2:[], [5])
          (x'', y'') = (5:[], [])
          (x''', y''') = ([],[])

so x = [4, 5] and y = [2] so eventually we obtain:

(1:[2], [4,5])
    where (x, y) = (4:[5], [2])
          (x', y') = (2:[], [5])
          (x'', y'') = (5:[], [])
          (x''', y''') = ([],[])

so the result is the expected ([1,2], [4,5]).

回答3:

Let's translate the fold away.

splatter :: [a] -> ([a], [a])
splatter = foldr (\a ~(x,y) -> (a:y,x)) ([],[])

What's this mean? foldr for lists is defined

foldr :: (a -> r -> r) -> r -> [a] -> r
foldr k z = go
  where
    go [] = z
    go (p : ps) = p `k` go ps

Let's inline it and simplify:

splatter = go
  where
    go [] = ([], [])
    go (p : ps) =
      (\a ~(x,y) -> (a:y,x)) p (go ps)

splatter = go
  where
    go [] = ([], [])
    go (p : ps) =
      (\ ~(x,y) -> (p:y,x)) (go ps)

splatter = go
  where
    go [] = ([], [])
    go (p : ps) =
      let (x, y) = go ps
      in (p : y, x)

The lazy-by-default pattern match in the let means that we don't actually actually make the recursive call until someone forces x or y.

The key thing to notice is that x and y swap places on each recursive call. This leads to the alternating pattern.

回答4:

Approximately,

foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [a,b,c,d,e]
=
let g a ~(x,y) = (a:y,x) in
g a $ g b $ g c $ g d $ g e ([],[])
=
g a $ g b $ g c $ g d $ ([e],[])
=
g a $ g b $ g c $ ([d],[e])
=
g a $ g b $ ([c,e],[d])
=
g a $ ([b,d],[c,e])
=
([a,c,e],[b,d])

But truly,

foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [a,b,c,d,e]
=
let g a ~(x,y) = (a:y,x) in
g a $ foldr g ([],[]) [b,c,d,e]
=
(a:y,x) where 
    (x,y) = foldr g ([],[]) [b,c,d,e]
=
(a:y,x) where 
    (x,y) = (b:y2,x2) where
                 (x2,y2) = foldr g ([],[]) [c,d,e]
=
(a:y,x) where 
    (x,y) = (b:y2,x2) where
                 (x2,y2) = (c:y3,x3) where
                                (x3,y3) = (d:y4,x4) where
                                               (x4,y4) = (e:y5,x5) where
                                                              (x5,y5) = ([],[])

which is forced in the top-down manner by access (if and when), being progressively fleshed-out as, e.g.,

=
(a:x2,b:y2) where 
                 (x2,y2) = (c:y3,x3) where
                                (x3,y3) = (d:y4,x4) where
                                               (x4,y4) = (e:y5,x5) where
                                                              (x5,y5) = ([],[])
=
(a:c:y3,b:x3) where 
                                (x3,y3) = (d:y4,x4) where
                                               (x4,y4) = (e:y5,x5) where
                                                              (x5,y5) = ([],[])
=
(a:c:x4,b:d:y4) where 
                                               (x4,y4) = (e:y5,x5) where
                                                              (x5,y5) = ([],[])
=
(a:c:e:y5,b:d:x5) where 
                                                              (x5,y5) = ([],[])
=
(a:c:e:[],b:d:[])

but it could be that the forcing will be done in a different order, depending on how it is called, e.g.

print . (!!1) . snd $ foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [a,b,c,d,e]
print . (!!2) . fst $ foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [a,b,c,d,e]

etc.

edit: to address the questions about the lazy pattern, it is done for proper laziness of the resulting function:

foldr with the combining function which is strict in its second argument, encodes recursion, which is bottom-up. The result of recursively processing the rest of the list is constructed first, and the head portion of the result is combined with that, afterwards.
foldr with the combining function which is lazy in its second argument, encodes corecursion, which is top-down. The head portion of the resulting value is constructed first, and the rest is filled out later. It is very reminiscent of tail recursion modulo cons, in Prolog and elsewhere. Lazy evaluation as a concept came from "CONS should not evaluate its arguments"; TRMC does not evaluate the second argument to the constructor until later, which is what really matters.

回答5:

So everything happens in the \a ~(x,y) -> (a:y,x) function where in first turn a is the last item from of the provided list and (x,y) is an alternating tuple accumulator that starts with ([],[]). The current element gets prepended to y by a:y but then the x and y lists in tuple gets swapped.

However it's worth to mention that, all new appendings are returned on the first side of the tuple which guarantees the first side eventually starts with the first item of the list since it gets appended the last.

So for a list of [1,2,3,4,5,6] the steps are follows

a          (x   ,   y)      return
----------------------------------
6       ([]     , []     ) (6:y, x)
5       ([6]    , []     ) (5:y, x)
4       ([5]    , [6]    ) (4:y, x)
3       ([4,6]  , [5]    ) (3:y, x)
2       ([3,5]  , [4,6]  ) (2:y, x)
1       ([2,4,6], [3,5]  ) (1:y, x)
[]      ([1,3,5], [2,4,6]) no return

Regarding the tilde ~ operator it is best described in the Haskell/Laziness topic of Haskell guide as follows

Prepending a pattern with a tilde sign delays the evaluation of the value until the component parts are actually used. But you run the risk that the value might not match the pattern — you're telling the compiler 'Trust me, I know it'll work out'. (If it turns out it doesn't match the pattern, you get a runtime error.) To illustrate the difference:

Prelude> let f (x,y) = 1
Prelude> f undefined
*** Exception: Prelude.undefined

Prelude> let f ~(x,y) = 1
Prelude> f undefined
1

In the first example, the value is evaluated because it has to match the tuple pattern. You evaluate undefined and get undefined, which stops the proceedings. In the latter example, you don't bother evaluating the parameter until it's needed, which turns out to be never, so it doesn't matter you passed it undefined.

回答6:

Effectively, the fold function alternates which list the next item from the input list is added to. A similar function in a language like Python would be

def split(xs):
    a0 = a = []
    b0 = b = []
    for x in xs:
        a.append(x)
        a, b = b, a
    return a0, b0

A lazy pattern is used for two reasons:

To allow consuming the resulting lists immediately, without waiting for foldr to consume all the input
To allow splitting of infinite lists.

Consider this example:

let (odds, evens) = foldr (\a ~(x,y) -> (a:y,x)) ([],[]) $ [1..]
in take 5 odds

The result is [1,3,5,7,9].

If you dropped the lazy pattern and used

let (odds, evens) = foldr (\a (x,y) -> (a:y,x)) ([],[]) $ [1..]
in take 10 odds

the code would never terminate, because take couldn't get the first element (let alone the first five) without first computing the entire list of odd values.

Why is that? Consider the definition of Data.List.foldr:

foldr k z = go
  where
    go [] = z
    go (y:ys) = y `k` go ys

If k = \a (x,y) -> (a:y, x) is strict in both arguments, then the evaluation of y `k` go ys doesn't terminate until the base case of go is reached.

Using a lazy pattern, the function is equivalent to

\a p -> (a:snd p, fst p)

meaning we never have to match on p until fst or snd does so; the function is now lazy in its second argument. That means that

go (y:ys) = y `k` go ys
          = (\a p -> (a:snd p, fst p)) y (go ys)
          = let p = go ys in (y:snd p, fst p)

returns immediately without further evaluating go. Only once we try to get the second element of either list do we need to call go again, but once again we only have to progress one step.

来源：https://stackoverflow.com/questions/58993547/walk-through-a-list-split-function-in-haskell

标签

list

haskell

split

fold