Efficient version of 'inits'

后端 未结 2 544
隐瞒了意图╮
隐瞒了意图╮ 2021-01-04 09:03

That is, inits \"abc\" == [\"\",\"a\",\"ab\",\"abc\"]

There is a standard version of inits in Data.List, but below I have written a version

相关标签:
2条回答
  • 2021-01-04 09:14

    Your myInits uses a technique called a difference list to make functions that build lists in linear time. I believe, but haven't checked, that the running time for completely evaluating myInits is O(n^2) requiring O(n^2) space. Fully evaluating inits also requires O(n^2) running time and space. Any version of inits will require O(n^2) space; lists built with : and [] can only share their tails, and there are no tails in common among the results of inits. The version of inits in Data.List uses an amortized time O(1) queue, much like the simpler queue described in the second half of a related answer. The Snoc referenced in the source code in Data.List is word play on Cons (another name for :) backwards - being able to append an item to the end of the list.

    Briefly experimenting with these functions suggests myInits performs satisfactorily when used sparsely on a large list. On my computer, in ghci, myInits [1..] !! 8000000 yields results in a few seconds. Unfortunately, I have the horrifyingly inefficient implementation that shipped with ghc 7.8.3, so I can't compare myInits to inits.

    Strictness

    There is one big difference between myInits and inits and between myTails and tails. They have different meanings when applied to undefined or _|_ (pronounced "bottom", another symbol we use for undefined).

    inits has the strictness property inits (xs ++ _|_) = inits xs ++ _|_, which, when xs is the empty list [] says that inits will still yield at least one result when applied to undefined

    inits (xs ++ _|_) = inits xs ++ _|_
    inits ([] ++ _|_) = inits [] ++ _|_
    inits        _|_  = [[]]     ++ _|_
    inits        _|_  =  []      :  _|_
    

    We can see this experimentally

    > head . inits $ undefined
    []
    

    myInits does not have this property either for the empty list or for longer lists.

    > head $ myInits undefined
    *** Exception: Prelude.undefined
    
    > take 3 $ myInits ([1,2] ++ undefined)
    [[],[1]*** Exception: Prelude.undefined
    

    We can fix this if we realize that f in myInits would yield start [] in either branch. Therefore, we can delay the pattern matching until it is needed to decide what to do next.

    myInits' = f id
     where
      f start list = (start []):
                     case list of
                         (l:ls) -> f (start . (l:)) ls
                         []     -> []
    

    This increase in laziness makes myInits' work just like inits.

    > head $ myInits' undefined
    []
    
    > take 3 $ myInits' ([1,2] ++ undefined)
    [[],[1],[1,2]]
    

    Similarly, the difference between your myTails and tails in Data.List is that tails yields the entire list as the first result before deciding if there will be a remainder of the list. The documentation says it obeys tails _|_ = _|_ : _|_, but it actually obeys a much stronger rule that's hard to describe easily.

    0 讨论(0)
  • 2021-01-04 09:26

    The prefix-functions building can be separated from their reification as actual lists:

    diffInits = map ($ []) . scanl (\a x -> a . (x:)) id
    

    This is noticeably faster (tested inside GHCi), and is lazier than your version (see Cirdec's answer for the discussion):

    diffInits        _|_  == []           :  _|_
    diffInits (xs ++ _|_) == diffInits xs ++ _|_
    
    0 讨论(0)
提交回复
热议问题