I\'m asking with regards to c#, but I assume its the same in most other languages.
Does anyone have a good definition of expressions and statements
Expressions can be evaluated to get a value, whereas statements don't return a value (they're of type void).
Function call expressions can also be considered statements of course, but unless the execution environment has a special built-in variable to hold the returned value, there is no way to retrieve it.
Statement-oriented languages require all procedures to be a list of statements. Expression-oriented languages, which is probably all functional languages, are lists of expressions, or in tha case of LISP, one long S-expression that represents a list of expressions.
Although both types can be composed, most expressions can be composed arbitrarily as long as the types match up. Each type of statement has its own way of composing other statements, if they can do that all. Foreach and if statements require either a single statment or that all subordinate statements go in a statement block, one after another, unless the substatements allow for thier own substatements.
Statements can also include expressions, where an expression doesn't really include any statements. One exception, though, would be a lambda expression, which represents a function, and so can include anything a function can iclude unless the language only allows for limited lambdas, like Python's single-expression lambdas.
In an expression-based language, all you need is a single expression for a function since all control structures return a value (a lot of them return NIL). There's no need for a return statement since the last-evaluated expression in the function is the return value.
I am not really satisfied with any of the answers here. I looked at the grammar for C++ (ISO 2008). However maybe for the sake of didactics and programming the answers might suffice to distinguish the two elements (reality looks more complicated though).
A statement consists of zero or more expressions, but can also be other language concepts. This is the Extended Backus Naur form for the grammar (excerpt for statement):
statement:
labeled-statement
expression-statement <-- can be zero or more expressions
compound-statement
selection-statement
iteration-statement
jump-statement
declaration-statement
try-block
We can see the other concepts that are considered statements in C++.
case
for example is a labeled-statementif
if/else
, case
while
, do...while
, for (...)
break
, continue
, return
(can return expression), goto
try/catch
blocksThis is an excerpt showing the expressions part:
expression:
assignment-expression
expression "," assignment-expression
assignment-expression:
conditional-expression
logical-or-expression assignment-operator initializer-clause
throw-expression
+
, -
, *
, /
, &
, |
, &&
, ||
, ...)throw
clause is an expression tooExpression
A piece of syntax which can be evaluated to some value. In other words, an expression is an accumulation of expression elements like literals, names, attribute access, operators or function calls which all return a value. In contrast to many other languages, not all language constructs are expressions. There are also statements which cannot be used as expressions, such as while. Assignments are also statements, not expressions.
Statement
A statement is part of a suite (a “block” of code). A statement is either an expression or one of several constructs with a keyword, such as if, while or for.
A statement is a special case of an expression, one with void
type. The tendency of languages to treat statements differently often causes problems, and it would be better if they were properly generalized.
For example, in C# we have the very useful Func<T1, T2, T3, TResult>
overloaded set of generic delegates. But we also have to have a corresponding Action<T1, T2, T3>
set as well, and general purpose higher-order programming constantly has to be duplicated to deal with this unfortunate bifurcation.
Trivial example - a function that checks whether a reference is null before calling onto another function:
TResult IfNotNull<TValue, TResult>(TValue value, Func<TValue, TResult> func)
where TValue : class
{
return (value == null) ? default(TValue) : func(value);
}
Could the compiler deal with the possibility of TResult
being void
? Yes. All it has to do is require that return is followed by an expression that is of type void
. The result of default(void)
would be of type void
, and the func being passed in would need to be of the form Func<TValue, void>
(which would be equivalent to Action<TValue>
).
A number of other answers imply that you can't chain statements like you can with expressions, but I'm not sure where this idea comes from. We can think of the ;
that appears after statements as a binary infix operator, taking two expressions of type void
and combining them into a single expression of type void
.
Expression: Something which evaluates to a value. Example: 1+2/x
Statement: A line of code which does something. Example: GOTO 100
In the earliest general-purpose programming languages, like FORTRAN, the distinction was crystal-clear. In FORTRAN, a statement was one unit of execution, a thing that you did. The only reason it wasn't called a "line" was because sometimes it spanned multiple lines. An expression on its own couldn't do anything... you had to assign it to a variable.
1 + 2 / X
is an error in FORTRAN, because it doesn't do anything. You had to do something with that expression:
X = 1 + 2 / X
FORTRAN didn't have a grammar as we know it today—that idea was invented, along with Backus-Naur Form (BNF), as part of the definition of Algol-60. At that point the semantic distinction ("have a value" versus "do something") was enshrined in syntax: one kind of phrase was an expression, and another was a statement, and the parser could tell them apart.
Designers of later languages blurred the distinction: they allowed syntactic expressions to do things, and they allowed syntactic statements that had values. The earliest popular language example that still survives is C. The designers of C realized that no harm was done if you were allowed to evaluate an expression and throw away the result. In C, every syntactic expression can be a made into a statement just by tacking a semicolon along the end:
1 + 2 / x;
is a totally legit statement even though absolutely nothing will happen. Similarly, in C, an expression can have side-effects—it can change something.
1 + 2 / callfunc(12);
because callfunc
might just do something useful.
Once you allow any expression to be a statement, you might as well allow the assignment operator (=) inside expressions. That's why C lets you do things like
callfunc(x = 2);
This evaluates the expression x = 2 (assigning the value of 2 to x) and then passes that (the 2) to the function callfunc
.
This blurring of expressions and statements occurs in all the C-derivatives (C, C++, C#, and Java), which still have some statements (like while
) but which allow almost any expression to be used as a statement (in C# only assignment, call, increment, and decrement expressions may be used as statements; see Scott Wisniewski's answer).
Having two "syntactic categories" (which is the technical name for the sort of thing statements and expressions are) can lead to duplication of effort. For example, C has two forms of conditional, the statement form
if (E) S1; else S2;
and the expression form
E ? E1 : E2
And sometimes people want duplication that isn't there: in standard C, for example, only a statement can declare a new local variable—but this ability is useful enough that the GNU C compiler provides a GNU extension that enables an expression to declare a local variable as well.
Designers of other languages didn't like this kind of duplication, and they saw early on that if expressions can have side effects as well as values, then the syntactic distinction between statements and expressions is not all that useful—so they got rid of it. Haskell, Icon, Lisp, and ML are all languages that don't have syntactic statements—they only have expressions. Even the class structured looping and conditional forms are considered expressions, and they have values—but not very interesting ones.
For an explanation of important differences in composability (chainability) of expressions vs statements, my favorite reference is John Backus's Turing award paper, Can programming be liberated from the von Neumann style?.
Imperative languages (Fortran, C, Java, ...) emphasize statements for structuring programs, and have expressions as a sort of after-thought. Functional languages emphasize expressions. Purely functional languages have such powerful expressions than statements can be eliminated altogether.