Do I need to take explicit actions to facilitate sharing with persistent data structures?

爱⌒轻易说出口 提交于 2019-12-06 04:07:04
Will Ness

Sharing is a compiler thing. When it recognizes common sub-expressions, a compiler may chose to represent them both by the same object in memory. But even if you use such a compiler switch (like -fno-cse), it is under no obligation to do so, and the two might be (and usually are, in the absence of the switch) represented by two different, though of equal value, objects in memory. Re: referential transparency.

OTOH when we name something and use that name twice, we (reasonably) expect it to represent the same object in memory. But compiler might choose to duplicate it and use two separate copies in two different use sites, although it is not known to do so. But it might. Re: referential transparency.

See also:


Here's few examples with list-producing functions, drawing from the last link above. They rely on the compiler not duplicating anything, i.e. indeed sharing any named object as expected from call by need lambda calculus operational semantics (as explained by nponeccop in the comments), and not introducing any extra sharing on its own to eliminate common subexpressions:

  1. Sharing fixpoint combinator, creating a loop:

    fix f = x where x = f x

  2. Non-sharing fixpoint combinator, creating telescoping multistage chain (i.e. regular recursion chain)

    _Y f = f (_Y f)

  3. Two-stages combination - a loop and a feed

    _2 f = f (fix f)

would this mean that they no longer point to the same Point in memory?

I don't think you should be concerned with this as this is just an implementation detail of the runtime system (aka RTS of Haskell) for immutable values.

As far as other suggestion is concerned, I would say make the function findSet return the Point itself rather than the key as that would eliminate the lookup in union.

findSet :: DSForest -> Int -> Point
findSet dsf' x' = case _parent pt of
                     Just (Point v _ _)  -> findSet dsf' v
                     Nothing             -> pt
                  where
                      pt = (dsf' ! x')

Make appropriate changes in the union function.

First comment: the disjoint-set union-find data structure is very, very difficult to do well in a purely functional way. If you are just trying to get practice with persistent data structures, I strongly recommend starting with simpler structures like binary search trees.

Now, to see one problem, consider your findSet function. It does not implement path compression! That is, it does not make all the nodes along the path to the root point directly to the root. To do that, you would want to update all those points in the DSForest, so your function would then return (Int, DSForest) or perhaps (Point, DSForest). Doing this in a monad to handle all the plumbing of passing the DSForest around be easier than passing that forest around manually.

But now a second issue. Suppose you modify findSet as just described. It still wouldn't do quite what you want. In particular, suppose you have a chain where 2 is a child of 1, 3 is a child of 2, and 4 is a child of 3. And now you you do a findSet on 3. This will update 3's point so that its parent is 1 instead 2. But 4's parent is still the old 3 point whose parent is 2. This may not matter too much, because it looks like you never really do anything with the parent Point except pull out its value (in findSet). But the very fact that you never do anything with the parent Point except pull out its value says to me that it should be a Maybe Int instead of a Maybe Point.

Let me repeat and expand on what I said at the beginning. Disjoint sets are a particularly hard data structure to handle in a functional/persistent way, so I strongly recommend starting with an easier tree structure like binary search trees or leftist heaps or even abstract syntax trees. Those structures have the property that all access goes through the root--that is, you always start at the root and work your way down through the tree to get to the right place. This property makes the kind of sharing that is the hallmark of persistent data structures MUCH easier.

The disjoint set data structure does not have that property. Instead of always starting at the root and working down to the nodes of interest, you start at arbitrary nodes and work your way back up to the root. When you have unrestricted entry points like this, often the easiest way to handle it is to mediate all the sharing through a separate map (DSForest in your case), but that means passing that map back and forth everywhere.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!