When I try to use print
without parentheses on a simple name in Python 3.4 I get:
>>> print max
Traceback (most recent call last):
...
The special exception message for print
used as statement instead of as function is actually implemented as a special case.
Roughly speaking when a SyntaxError
is created it calls a special function that checks for a print
statement based on the line the exception refers to.
However, the first test in this function (the one responsible for the "Missing parenthesis" error message) is if there is any opening parenthesis in the line. I copied the source code for that function (CPython 3.6.4) and I marked the relevant lines with "arrows":
static int
_report_missing_parentheses(PySyntaxErrorObject *self)
{
Py_UCS4 left_paren = 40;
Py_ssize_t left_paren_index;
Py_ssize_t text_len = PyUnicode_GET_LENGTH(self->text);
int legacy_check_result = 0;
/* Skip entirely if there is an opening parenthesis <---------------------------- */
left_paren_index = PyUnicode_FindChar(self->text, left_paren,
0, text_len, 1);
if (left_paren_index < -1) {
return -1;
}
if (left_paren_index != -1) {
/* Use default error message for any line with an opening parenthesis <------------ */
return 0;
}
/* Handle the simple statement case */
legacy_check_result = _check_for_legacy_statements(self, 0);
if (legacy_check_result < 0) {
return -1;
}
if (legacy_check_result == 0) {
/* Handle the one-line complex statement case */
Py_UCS4 colon = 58;
Py_ssize_t colon_index;
colon_index = PyUnicode_FindChar(self->text, colon,
0, text_len, 1);
if (colon_index < -1) {
return -1;
}
if (colon_index >= 0 && colon_index < text_len) {
/* Check again, starting from just after the colon */
if (_check_for_legacy_statements(self, colon_index+1) < 0) {
return -1;
}
}
}
return 0;
}
That means it won't trigger the "Missing parenthesis" message if there is any opening parenthesis in the line. That leads to the general SyntaxError
message even if the opening parenthesis is in a comment:
print 10 # what(
print 10 # what(
^
SyntaxError: invalid syntax
Note that the cursor position for two names/variables separated by a white space is always the end of the second name:
>>> 10 100
10 100
^
SyntaxError: invalid syntax
>>> name1 name2
name1 name2
^
SyntaxError: invalid syntax
>>> name1 name2([1, 2])
name1 name2([1, 2])
^
SyntaxError: invalid syntax
So it is no wonder the cursor points to the x
of max
, because it's the last character in the second name. Everything that follows the second name (like .
, (
, [
, ...) is ignored, because Python already found a SyntaxError
, and it doesn't need to go further, because nothing could make it valid syntax.
in additions to those excellent answers, without even looking at the source code, we could have guessed that the print
special error message was a kludge:
so:
print dfjdkf
^
SyntaxError: Missing parentheses in call to 'print'
but:
>>> a = print
>>> a dsds
Traceback (most recent call last):
File "<interactive input>", line 1
a dsds
^
SyntaxError: invalid syntax
even if a == print
but at that stage, it isn't evaluated yet, so you get the generic invalid syntax message instead of the hacked print
syntax message, which proves that there's a kludge when the first token is print
.
another proof if needed:
>>> print = None
>>> print a
Traceback (most recent call last):
File "C:\Python34\lib\code.py", line 63, in runsource
print a
^
SyntaxError: Missing parentheses in call to 'print'
in that case print == None
, but the specific message still appears.
Looking at the source code for exceptions.c, right above _set_legacy_print_statement_msg
there's this nice block comment:
/* To help with migration from Python 2, SyntaxError.__init__ applies some
* heuristics to try to report a more meaningful exception when print and
* exec are used like statements.
*
* The heuristics are currently expected to detect the following cases:
* - top level statement
* - statement in a nested suite
* - trailing section of a one line complex statement
*
* They're currently known not to trigger:
* - after a semi-colon
*
* The error message can be a bit odd in cases where the "arguments" are
* completely illegal syntactically, but that isn't worth the hassle of
* fixing.
*
* We also can't do anything about cases that are legal Python 3 syntax
* but mean something entirely different from what they did in Python 2
* (omitting the arguments entirely, printing items preceded by a unary plus
* or minus, using the stream redirection syntax).
*/
So there's some interesting info. In addition, in the SyntaxError_init
method in the same file, we can see
/*
* Issue #21669: Custom error for 'print' & 'exec' as statements
*
* Only applies to SyntaxError instances, not to subclasses such
* as TabError or IndentationError (see issue #31161)
*/
if ((PyObject*)Py_TYPE(self) == PyExc_SyntaxError &&
self->text && PyUnicode_Check(self->text) &&
_report_missing_parentheses(self) < 0) {
return -1;
}
Note also that the above references issue #21669 on the python bugtracker with some discussion between the author and Guido about how to go about this. So we follow the rabbit (that is, _report_missing_parentheses
) which is at the very bottom of the file, and see...
legacy_check_result = _check_for_legacy_statements(self, 0);
However, there are some cases where this is bypassed and the normal SyntaxError
message is printed, see MSeifert's answer for more about that. If we go one function up to _check_for_legacy_statements
we finally see the actual check for legacy print statements.
/* Check for legacy print statements */
if (print_prefix == NULL) {
print_prefix = PyUnicode_InternFromString("print ");
if (print_prefix == NULL) {
return -1;
}
}
if (PyUnicode_Tailmatch(self->text, print_prefix,
start, text_len, -1)) {
return _set_legacy_print_statement_msg(self, start);
}
So, to answer the question: "Why isn't Python able to detect the problem earlier?", I would say the problem with parentheses isn't what is detected; it is actually parsed after the syntax error. It's a syntax error the whole time, but the actual minor piece about parentheses is caught afterwards just to give an additional hint.
Maybe I'm not understanding something, but I don't see why Python should point out the error earlier. print
is a regular function, that is a variable referencing a function, so these are all valid statements:
print(10)
print, max, 2
str(print)
print.__doc__
[print] + ['a', 'b']
{print: 2}
As I understand it, the parser needs to read the next full token after print
(max
in this case) in order to determine whether there is a syntax error. It cannot just say "fail if there is no open parenthesis", because there are a number of different tokens that may go after print
depending on the current context.
I don't think there is a case where print
may be directly followed by another identifier or a literal, so you could argue that as soon as there is one letter, a number or quotes you should stop, but that would be mixing the parser's and the lexer's job.