How to find out GHC's memory representations of data types?

前端 未结 2 711
感情败类
感情败类 2020-12-15 21:30

Recently, blog entries such as Computing the Size of a Hashmap explained how to reason about space complexities of commonly used container types. Now I\'m facing the questio

相关标签:
2条回答
  • 2020-12-15 21:43

    Memory footprints of Haskell Data Types

    (The following applies to GHC, other compilers may use different storage conventions)

    Rule of thumb: a constructor costs one word for a header, and one word for each field. Exception: a constructor with no fields (like Nothing or True) takes no space, because GHC creates a single instance of these constructors and shares it amongst all uses.

    A word is 4 bytes on a 32-bit machine, and 8 bytes on a 64-bit machine.

    So e.g.

    data Uno = Uno a
    data Due = Due a b
    

    an Uno takes 2 words, and a Due takes 3.

    Also I believe it is possible to write a haskell function which performs the same tasks as sizeof or offsetof

    0 讨论(0)
  • 2020-12-15 22:06

    My first idea was to use this neat litte function, due to Simon Marlow:

    {-# LANGUAGE MagicHash,UnboxedTuples #-}
    module Size where
    
    import GHC.Exts
    import Foreign
    
    unsafeSizeof :: a -> Int
    unsafeSizeof a =
      case unpackClosure# a of
        (# x, ptrs, nptrs #) ->
          sizeOf (undefined::Int) + -- one word for the header
            I# (sizeofByteArray# (unsafeCoerce# ptrs)
                 +# sizeofByteArray# nptrs)
    

    Using it:

    Prelude> :!ghc -c Size.hs
    
    Size.hs:15:18:
        Warning: Ignoring unusable UNPACK pragma on the
                 third argument of `BitVec257'
        In the definition of data constructor `BitVec257'
        In the data type declaration for `BitVec257'
    Prelude Size> unsafeSizeof $! BitVec514 (BitVec257 1 2 True 3 4) (BitVec257 1 2 True 3 4)
    74
    

    (Note that GHC is telling you that it cannot unbox Bool since it's a sum type.)

    The above function claims that your data type uses 74 bytes on a 64-bit machine. I find that hard to believe. I'd expect the data type to use 11 words = 88 bytes, one word per field. Even Bools take one word, as they are pointer to (statically allocated) constructors. I'm not quite sure what's going on here.

    As for alignment I believe every field should be word aligned.

    0 讨论(0)
提交回复
热议问题