Why is it ill-formed to have multi-line constexpr functions?

心已入冬 提交于 2019-12-03 22:13:20

The reason is that the compiler has plenty to do already, without also being a full-fledged interpreter, able to evaluate arbitrary C++ code.

If they stick with single expressions, they limit the number of cases to consider dramatically. Loosely speaking, it simplifies things a lot that there are no semicolons in particular.

Every time a ; is encountered, it means the compiler has to deal with side effects. It means that some local state was changed in the previous statement, which the following statement is going to rely on. It means that the code being evaluated is no longer just a series of simple operations each taking as its inputs the previous operation's output, but require access to memory as well, which is much harder to reason about.

In a nutshell, this:

7 * 2 + 4 * 3

is simple to compute. You can build a syntax tree which looks like this:

   +
  /\
 /  \
 *   *
/\  /\
7 2 4 3

and the compiler can simply traverse this tree performing these primitive operations at each node, and the root node is implicitly the return value of the expression.

If we were to write the same computation using multiple lines we could do it like this:

int i0 = 7;
int i1 = 2;
int i2 = 4;
int i3 = 3;

int i4 = i0 * i1;
int i5 = i2 * i3;
int i6 = i4 + i5;
return i6;

which is much harder to interpret. We need to handle memory reads and writes, and we have to handle return statements. Our syntax tree just became a lot more complex. We need to handle variable declarations. We need to handle statements which have no return value (say, a loop, or a memory write), but which simply modify some memory somewhere. Which memory? Where? What if it accidentally overwrites some of the compiler's own memory? What if it segfaults?

Even without all the nasty 'what-if's, the code the compiler has to interpret just got a lot more complex. The syntax tree might now look something like this: (LD and ST are load and store operations respectively)

    ;    
    /\
   ST \
   /\  \
  i0 3  \
        ;
       /\
      ST \
      /\  \
     i1 4  \
           ;
          /\
         ST \
         / \ \
       i2  2  \
              ;
             /\
            ST \
            /\  \
           i3 7  \
                 ;
                /\
               ST \
               /\  \
              i4 *  \
                 /\  \
               LD LD  \
                |  |   \
                i0 i1   \
                        ;
                       /\
                      ST \
                      /\  \
                     i5 *  \
                        /\  \
                       LD LD \
                        |  |  \
                        i2 i3  \
                               ;
                              /\
                             ST \
                             /\  \
                            i6 +  \
                               /\  \
                              LD LD \
                               |  |  \
                               i4 i5  \
                                      LD
                                       |
                                       i6

Not only does it look a lot more complex, it also now requires state. Before, each subtree could be interpreted in isolation. Now, they all depend on the rest of the program. One of the LD leaf operations doesn't make sense unless it is placed in the tree so that a ST operation has been executed on the same location previously.

Just in case there's any confusion here, you are aware that constexpr functions/expressions are evaluated at compile-time. There's no runtime performance concern involved.

Knowing this, the reason that they only allow single return statements in constexpr functions is so that compiler implementors don't need to write a virtual machine to calculate the constant value.

I am concerned about QoI issues with this though. I wonder if the compiler implementors will be clever enough to perform memoization?

constexpr fib(int n) { return < 2 ? 1 : fib(n-1) + fib(n-2); }

Without memoization, the above function has O(2n) complexity, which is certainly not something I'd like to feel, even at compile time.

As I understand it they kept it as simple as possible so as not to complicate the language (in fact I seem to remember a time in which recursive calls weren't allowed but that is no longer the case). The rationale being that it's much easier to relax rules in future standards than it is to restrict them.

EDIT: Ignore this answer. The referenced paper is out of date. The standard will allow limited recursion (see the comments).

Both forms are illegal. Recursion isn't allowed in constexpr functions, due to the restriction that a constexpr function cannot be called until it is defined. The link the OP provided states this explicitly:

constexpr int twice(int x);
enum { bufsz = twice(256) }; // error: twice() isn’t (yet) defined

constexpr int fac(int x)
{ return x > 2 ? x * fac(x - 1) : 1; } // error: fac() not defined
                                       // before use

A few lines further down:

The requirement that a constant-expression function can only call previously defined constant-expression functions ensures that we don’t get into any problems related to recursion.

...

We (still) prohibit recursion in all its form in constant expressions.

Without these restrictions you become embroiled in the halting problem (thanks @Grant for jogging my memory with your comment on my other answer). Rather than impose arbitrary recursion limits, the designers considered it simpler to just say, "No".

It's probably ill-formed because it's too hard to implement. A similar decision was made in the first version of the standard with regard to member function closures (i.e., being able to pass off obj.func as a callable function). Maybe a later revision of the standard will offer more latitude.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!