Advantages of strict fields in data types

余生长醉 提交于 2019-11-28 21:08:19

Unless you're storing a large computation in the Int and Float fields, significant overhead can build up from lots of trivial computations building up in thunks. For instance, if you repeatedly add 1 to a lazy Float field in a data type, it will use up more and more memory until you actually force the field, calculating it.

Often, you want to store to expensive computation in a field. But if you know you won't be doing anything like that ahead of time, you can mark the field strict, and avoid having to manually add seq everywhere to get the efficiency you desire.

As an additional bonus, when given the flag -funbox-strict-fields GHC will unpack strict fields1 of data types directly into the data type itself, which is possible since it knows they will always be evaluated, and thus no thunk has to be allocated; in this case, a Bar value would contain the machine words comprising the Int and Float directly inside the Bar value in memory, rather than containing two pointers to thunks which contain the data.

Laziness is a very useful thing, but some of the time, it just gets in the way and impedes computation, especially for small fields that are always looked at (and thus forced), or that are modified often but never with very expensive computations. Strict fields help overcome these issues without having to modify all uses of the data type.

Whether it's more common than lazy fields or not depends on the type of code you're reading; you aren't likely to see any functional tree structures use strict fields extensively, for instance, because they benefit greatly from laziness.

Let's say you have an AST with a constructor for infix operations:

data Exp = Infix Op Exp Exp
         | ...

data Op = Add | Subtract | Multiply | Divide

You wouldn't want to make the Exp fields strict, as applying a policy like that would mean that the entire AST is evaluated whenever you look at the top-level node, which is clearly not what you want to benefit from laziness. However, the Op field is never going to contain an expensive computation that you want to defer to a later date, and the overhead of a thunk per infix operator might get expensive if you have really deeply-nested parse trees. So for the infix constructor, you'd want to make the Op field strict, but leave the two Exp fields lazy.

1 Only single-constructor types can be unpacked.

mucaho

In addition to the information provided by other answers, keep in mind:

To my knowledge with !, one can make sure a parameter for a data constructor is being evaluated before the value is constructed

It's interesting to look at how deep the parameter is evaluated - it's like with seq and $! evaluated to WHNF.

Given the datatypes

data Foo = IntFoo !Int | FooFoo !Foo | BarFoo !Bar
data Bar = IntBar Int

the expression

let x' = IntFoo $ 1 + 2 + 3
in  x'

evaluated to WHNF produces value IntFoo 6 (== fully evaluated, == NF).
Additionally this expression

let x' = FooFoo $ IntFoo $ 1 + 2 + 3
in  x'

evaluated to WHNF produces value FooFoo (IntFoo 6) (== fully evaluated, == NF).
However, this expression

let x' = BarFoo $ IntBar $ 1 + 2 + 3
in  x'

evaluated to WHNF produces value BarFoo (IntBar (1 + 2 + 3)) (!= fully evaluated, != NF).

Main point: The strictness of the !Bar parameter won't necessarily help if the data constructors of Bar don't contain strict parameters themselves.

There is an overhead associated with lazyness — the compiler has to create a thunk for the value to store the computation until the result is needed. If you know that you'll always need the result sooner or later, then it can make sense to force the evaluation of the result.

Lazyness comes at a cost, otherwise every language would have it.

The cost is 2-fold:

  1. It may take longer to set up the thunk (i.e. the description of what has to be computed when it is going to be computed eventually) than to do the operation right away.
  2. Unevaluated thunks that go as non strict arguments to other thunks that go as non strict arguments to yet other thunks etc. will use more and more memory. Unfortunately, those tunks may also hold references to memory that is not accessible anymore, i.e. memory that could be freed when only the thunk would be evaluated, thus preventing the garbage collector to do its work. An example would be a thunk that is supposed to update a certain value in the tree. Say this value holds on 100MB worth of other values. If there is no reference to the old tree anymore, this memory is wasted as long as the thunk is not evaluated.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!