tldr; see the final line; the rest is just preamble.
I am developing a test harness, which parses user scripts and generates a Python script which it then ru
You could use exceptions handling and catch actually NameError
and SyntaxError
. Test it inside try/except
block and inform user if there is some invalid input.
In Python 3, as above, you can simply use str.isidentifier. But in Python 2, this does not exist.
The tokenize module has a regex for names (identifiers): tokenize.Name. But I couldn't find any documentation for it, so it may not be available everywhere. It is simply r'[a-zA-Z_]\w*'
. A single $
after it will let you test strings with re.match
.
The docs say that an identifier is defined by this grammar:
identifier ::= (letter|"_") (letter | digit | "_")*
letter ::= lowercase | uppercase
lowercase ::= "a"..."z"
uppercase ::= "A"..."Z"
digit ::= "0"..."9"
Which is equivalent to the regex above. But we should still import tokenize.Name
in case this ever changes. (Which is very unlikely, but maybe in older versions of Python it was different?)
And to filter out keywords, like pass
, def
and return
, use keyword.iskeyword
. There is one caveat: None
is not a keyword in Python 2, but still can't be assigned to. (keyword.iskeyword('None')
in Python 2 is False
).
So:
import keyword
if hasattr(str, 'isidentifier'):
_isidentifier = str.isidentifier
else:
import re
_fallback_pattern = '[a-zA-Z_][a-zA-Z0-9_]*'
try:
import tokenize
except ImportError:
_isidentifier = re.compile(_fallback_pattern + '$').match
else:
_isidentifier = re.compile(
getattr(tokenize, 'Name', _fallback_pattern) + '$'
).match
del _fallback_pattern
def isname(s):
return bool(_isidentifier(s)) and not keyword.iskeyword(s) and s != 'None'
You can just let Python (works on any version in use today, as far as I know) do the check for you they way it normally would internally, and catch the exception:
def _dummy_function_taking_kwargs(**_):
pass
try:
_dummy_function_taking_kwargs(**{my_variable: None})
# if the above line didn't raise and we get here,
# the keyword/variable name was valid.
# You could also replace the external dummy function
# with an inline lambda function.
except TypeError:
# If we get here, it wasn't.
Notably, TypeError
is consistently raised whenever a dict
undergoes keyword argument expansion and has a key which isn't a valid function argument, and whenever a dict
literal is being constructed with an invalid key.
The advantage over the accepted answer is that it is both compatible across both Python 3 and 2, and not as fragile as the ast.parse
/compile
approach (which would count strings like foo = bar; qux
as valid).
I haven't thoroughly audited this solution or written Hypothesis tests for it to fuzz it, so there might be some corner case, but it seems to generally work on Python 3.7, 3.6, 2.7, and 2.5 (not that anyone ought to be using 2.5 nowadays, but it's still out in the wild and you might be one of the few poor sods stuck having to write code that works with 2.6/2.5).
I don't think you need the exact same naming syntax as python itself. Would rather go for a simple regexp like:
\w+
to make sure it's something alphanumeric, and then add a prefix to keep away from python's own syntax. So the non-techie user's declaration:
LET return = 12
should probably become after your parsing:
userspace_return = 12
or
userspace['return'] = 12
You could try a test assignment and see if it raises a SyntaxError
:
>>> 2fg = 5
File "<stdin>", line 1
2fg = 5
^
SyntaxError: invalid syntax
In Python 3 you can use str.isidentifier() to test whether a given string is a valid Python identifier/name.
>>> 'X'.isidentifier()
True
>>> 'X123'.isidentifier()
True
>>> '2'.isidentifier()
False
>>> 'while'.isidentifier()
True
The last example shows that you should also check whether the variable name clashes with a Python keyword:
>>> from keyword import iskeyword
>>> iskeyword('X')
False
>>> iskeyword('while')
True
So you could put that together in a function:
from keyword import iskeyword
def is_valid_variable_name(name):
return name.isidentifier() and not iskeyword(name)
Another option, which works in Python 2 and 3, is to use the ast
module:
from ast import parse
def is_valid_variable_name(name):
try:
parse('{} = None'.format(name))
return True
except SyntaxError, ValueError, TypeError:
return False
>>> is_valid_variable_name('X')
True
>>> is_valid_variable_name('123')
False
>>> is_valid_variable_name('for')
False
>>> is_valid_variable_name('')
False
>>> is_valid_variable_name(42)
False
This will parse the assignment statement without actually executing it. It will pick up invalid identifiers as well as attempts to assign to a keyword. In the above code None
is an arbitrary value to assign to the given name - it could be any valid expression for the RHS.