How does the compiler know that the comma in a function call is not a comma operator?

只谈情不闲聊 提交于 2019-11-27 22:40:03
ams

Look at the grammar for the C language. It's listed, in full, in Appendix A of the standard. The way it works is that you can step through each token in a C program and match them up with the next item in the grammar. At each step you have only a limited number of options, so the interpretation of any given character will depend on the context in which it appears. Inside each rule in the grammar, each line gives a valid alternative for the program to match.

Specifically, if you look for parameter-list, you will see that it contains an explicit comma. Therefore, whenever the compiler's C parser is in "parameter-list" mode, commas that it finds will be understood as parameter separators, not as comma operators. The same is true for brackets (that can also occur in expressions).

This works because the parameter-list rule is careful to use assignment-expression rules, rather than just the plain expression rule. An expression can contain commas, whereas an assignment-expression cannot. If this were not the case the grammar would be ambiguous, and the compiler would not know what to do when it encountered a comma inside a parameter list.

However, an opening bracket, for example, that is not part of a function definition/call, or an if, while, or for statement, will be interpreted as part of an expression (because there's no other option, but only if the start of an expression is a valid choice at that point), and then, inside the brackets, the expression syntax rules will apply, and that allows comma operators.

From C99 6.5.17:

As indicated by the syntax, the comma operator (as described in this subclause) cannot appear in contexts where a comma is used to separate items in a list (such as arguments to functions or lists of initializers). On the other hand, it can be used within a parenthesized expression or within the second expression of a conditional operator in such contexts. In the function call

f(a, (t=3, t+2), c)

the function has three arguments, the second of which has the value 5.

Another similar example is the initializer list of arrays or structs:

int array[5] = {1, 2};
struct Foo bar = {1, 2};

If a comma operator were to be used as the function parameter, use it like this:

sum((a,b))

This won't compile, of course.

The reason is the C Grammar. While everyone else seems to like to cite the example, the real deal is the phrase structure grammar for function calls in the Standard (C99). Yes, a function call consists of the () operator applied to a postfix expression (like for example an identifier):

 6.5.2 postfix-expression:
       ...
       postfix-expression ( argument-expression-list_opt )

together with

argument-expression-list:
       assignment-expression
       argument-expression-list , assignment-expression    <-- arglist comma

expression:
       assignment-expression
       expression , assignment-expression                  <-- comma operator

The comma operator can only occur in an expression, i.e. further down the in the grammar. So the compiler treats a comma in a function argument list as the one separating assignment-expressions, not as one separating expressions.

Roddy

Existing answers say "because the C language spec says it's a list separator, and not an operator".

However, your question is asking "how does the compiler know...", and that's altogether different: It's really no different from how the compiler knows that the comma in printf("Hello, world\n"); isn't a comma operator: The compiler 'knows' because of the context where the comma appears - basically, what's gone before.

The C 'language' can be described in Backus-Naur Form (BNF) - essentially, a set of rules that the compiler's parser uses to scan your input file. The BNF for C will distinguish between these different possible occurences of commas in the language.

There are lots of good resources on how compilers work, and how to write one.

The draft C99 standard says:

As indicated by the syntax, the comma operator (as described in this subclause) cannot appear in contexts where a comma is used to separate items in a list (such as arguments to functions or lists of initializers). On the other hand, it can be used within a parenthesized expression or within the second expression of a conditional operator in such contexts. In the function call f(a, (t=3, t+2), c) the function has three arguments, the second of which has the value 5.

In other words, "because".

There are multiple facets to this question. One par is that the definition says so. Well, how does the compiler know what context this comma is in? That's the parser's job. For C in particular, the language can be parsed by an LR(1) parser (http://en.wikipedia.org/wiki/Canonical_LR_parser).

The way this works is that the parser generates a bunch of tables that make up the possible states of the parser. Only a certain set of symbols are valid in certain states, and the symbols may have different meaning in different states. The parser knows that it is parsing a function because of the preceding symbols. Thus, it knows the possible states do not include the comma operator.

I am being very general here, but you can read all about the details in the Wiki.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!