What are the space complexities of inits and tails?

大城市里の小女人 提交于 2020-01-11 09:23:09

问题


TL; DR

After reading the passage about persistence in Okasaki's Purely Functional Data Structures and going over his illustrative examples about singly linked lists (which is how Haskell's lists are implemented), I was left wondering about the space complexities of Data.List's inits and tails...

It seems to me that

  • the space complexity of tails is linear in the length of its argument, and
  • the space complexity of inits is quadratic in the length of its argument,

but a simple benchmark indicates otherwise.

Rationale

With tails, the original list can be shared. Computing tails xs simply consists in walking along list xs and creating a new pointer to each element of that list; no need to recreate part of xs in memory.

In contrast, because each element of inits xs "ends in a different way", there can be no such sharing, and all the possible prefixes of xs must be recreated from scratch in memory.

Benchmark

The simple benchmark below shows there isn't much of a difference in memory allocation between the two functions:

-- Main.hs

import Data.List (inits, tails)

main = do
    let intRange = [1 .. 10 ^ 4] :: [Int]
    print $ sum intRange
    print $ fInits intRange
    print $ fTails intRange

fInits :: [Int] -> Int
fInits = sum . map sum . inits

fTails :: [Int] -> Int
fTails = sum . map sum . tails

After compiling my Main.hs file with

ghc -prof -fprof-auto -O2 -rtsopts Main.hs

and running

./Main +RTS -p

the Main.prof file reports the following:

COST CENTRE MODULE  %time %alloc

fInits      Main     60.1   64.9
fTails      Main     39.9   35.0

The memory allocated for fInits and that allocated for fTails have the same order of magnitude... Hum...

What is going on?

  • Are my conclusions about the space complexities of tails (linear) and inits (quadratic) correct?
  • If so, why does GHC allocate roughly as much memory for fInits and fTails? Does list fusion have something to do with this?
  • Or is my benchmark flawed?

回答1:


The implementation of inits in the Haskell Report, which is identical to or nearly identical to implementations used up to base 4.7.0.1 (GHC 7.8.3) is horribly slow. In particular, the fmap applications stack up recursively, so forcing successive elements of the result gets slower and slower.

inits [1,2,3,4] = [] : fmap (1:) (inits [2,3,4])
 = [] : fmap (1:) ([] : fmap (2:) (inits [3,4]))
 = [] : [1] : fmap (1:) (fmap (2:) ([] : fmap (3:) (inits [4])))
....

The simplest asymptotically optimal implementation, explored by Bertram Felgenhauer, is based on applying take with successively larger arguments:

inits xs = [] : go (1 :: Int) xs where
  go !l (_:ls) = take l xs : go (l+1) ls
  go _  []     = []

Felgenhauer was able to eke some extra performance out of this using a private, non-fusing version of take, but it was still not as fast as it could be.

The following very simple implementation is significantly faster in most cases:

inits = map reverse . scanl (flip (:)) []

In some weird corner cases (like map head . inits), this simple implementation is asymptotically non-optimal. I therefore wrote a version using the same technique, but based on Chris Okasaki's Banker's queues, that is both asymptotically optimal and nearly as fast. Joachim Breitner optimized it further, primarily by using a strict scanl' rather than the usual scanl, and this implementation got into GHC 7.8.4. inits can now produce the spine of the result in O(n) time; forcing the entire result requires O(n^2) time because none of the conses can be shared among the different initial segments. If you want really absurdly fast inits and tails, your best bet is to use Data.Sequence; Louis Wasserman's implementation is magical. Another possibility would be to use Data.Vector—it presumably uses slicing for such things.



来源:https://stackoverflow.com/questions/29393412/what-are-the-space-complexities-of-inits-and-tails

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!