How does Data.MemoCombinators work?

北战南征 提交于 2019-11-27 03:13:08

This library is a straightforward combinatorization of the well-known technique of memoization. Let's start with the canonical example:

fib = (map fib' [0..] !!)
    where
    fib' 0 = 0
    fib' 1 = 1
    fib' n = fib (n-1) + fib (n-2)

I interpret what you said to mean that you know how and why this works. So I'll focus on the combinatorization.

We are essentiallly trying to capture and generalize the idea of (map f [0..] !!). The type of this function is (Int -> r) -> (Int -> r), which makes sense: it takes a function from Int -> r and returns a memoized version of the same function. Any function which is semantically the identity and has this type is called a "memoizer for Int" (even id, which doesn't memoize). We generalize to this abstraction:

type Memo a = forall r. (a -> r) -> (a -> r)

So a Memo a, a memoizer for a, takes a function from a to anything, and returns a semantically identical function that has been memoized (or not).

The idea of the different memoizers is to find a way to enumerate the domain with a data structure, map the function over them, and then index the data structure. bool is a good example:

bool :: Memo Bool
bool f = table (f True, f False)
    where
    table (t,f) True = t
    table (t,f) False = f

Functions from Bool are equivalent to pairs, except a pair will only evaluate each component once (as is the case for every value that occurs outside a lambda). So we just map to a pair and back. The essential point is that we are lifting the evaluation of the function above the lambda for the argument (here the last argument of table) by enumerating the domain.

Memoizing Maybe a is a similar story, except now we need to know how to memoize a for the Just case. So the memoizer for Maybe takes a memoizer for a as an argument:

maybe :: Memo a -> Memo (Maybe a)
maybe ma f = table (f Nothing, ma (f . Just))
    where
    table (n,j) Nothing = n
    table (n,j) (Just x) = j x

The rest of the library is just variations on this theme.

The way it memoizes integral types uses a more appropriate structure than [0..]. It's a bit involved, but basically just creates an infinite tree (representing the numbers in binary to elucidate the structure):

1
  10
    100
      1000
      1001
    101
      1010
      1011
  11
    110
      1100
      1101
    111
      1110
      1111

So that looking up a number in the tree has running time proportional to the number of bits in its representation.

As sclv points out, Conal's MemoTrie library uses the same underlying technique, but uses a typeclass presentation instead of a combinator presentation. We released our libraries independently at the same time (indeed, within a couple hours!). Conal's is easier to use in simple cases (there is only one function, memo, and it will determine the memo structure to use based on the type), whereas mine is more flexible, as you can do things like this:

boundedMemo :: Integer -> Memo Integer
boundedMemo bound f = \z -> if z < bound then memof z else f z
   where
   memof = integral f

Which only memoizes values less than a given bound, needed for the implementation of one of the project euler problems.

There are other approaches, for example exposing an open fixpoint function over a monad:

memo :: MonadState ... m => ((Integer -> m r) -> (Integer -> m r)) -> m (Integer -> m r)

Which allows yet more flexibility, eg. purging caches, LRU, etc. But it is a pain in the ass to use, and also it puts strictness constraints on the function to be memoized (e.g. no infinite left recursion). I don't believe there are any libraries that implement this technique.

Did that answer what you were curious about? If not, perhaps make explicit the points you are confused about?

The heart is the bits function:

-- | Memoize an ordered type with a bits instance.
bits :: (Ord a, Bits a) => Memo a
bits f = IntTrie.apply (fmap f IntTrie.identity)

It is the only function (except the trivial unit :: Memo ()) which can give you a Memo a value. It uses the same idea as in this page about Haskell memoization. Section 2 shows the simplest memoization strategy using a list and section 3 does the same using a binary tree of naturals similar to the IntTree used in memocombinators.

The basic idea is to use a construction like (map fib [0 ..] !!) or in the memocombinators case - IntTrie.apply (fmap f IntTrie.identity). The thing to notice here is the correspondance between IntTie.apply and !! and also between IntTrie.identity and [0..].

The next step is memoizing functions with other types of arguments. This is done with the wrap function which uses an isomorphism between types a and b to construct a Memo b from a Memo a. For example:

Memo.integral f
=>
wrap fromInteger toInteger bits f
=>
bits (f . fromInteger) . toInteger
=>
IntTrie.apply (fmap (f . fromInteger) IntTrie.identity) . toInteger
~> (semantically equivalent)
(map (f . fromInteger) [0..] !!) . toInteger

The rest of the source code deals with types like List, Maybe, Either and memoizing multiple arguments.

Some of the work is done by IntTrie: http://hackage.haskell.org/package/data-inttrie-0.0.4

Luke's library is a variation of Conal's MemoTrie library, which he described here: http://conal.net/blog/posts/elegant-memoization-with-functional-memo-tries/

Some further expansion -- the general notion behind functional memoization is to take a function from a -> b and map it across a datastructure indexed by all possible values of a and containing values of b. Such a datastructure should be lazy in two ways -- first it should be lazy in the values it holds. Second, it should be lazily produced itself. The former is by default in a nonstrict language. The latter is accomplished by using generalized tries.

The various approaches of memocombinators, memotrie, etc are all just ways of creating compositions of pieces of tries over individual types of datastructures to allow for the simple construction of tries for increasingly complex structures.

@luqui One thing that is not clear to me: does this have the same operational behaviour as the following:

fib :: [Int]
fib = map fib' [0..]
    where fib' 0 = 0
             fib' 1 = 1
             fib' n = fib!!(n-1) + fib!!(n-2)

The above should memoize fib at the top level, and hence if you define two functions:

f n = fib!!n + fib!!(n+1)

If we then compute f 5, we obtain that fib 5 is not recomputed when computing fib 6. It is not clear to me whether the memoization combinators have the same behaviour (i.e. top-level memoization instead of only prohibiting the recomputation "inside" the fib computation), and if so, why exactly?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!