Pythonically check if a variable name is valid

后端 未结 6 2081
臣服心动
臣服心动 2020-12-29 07:37

tldr; see the final line; the rest is just preamble.


I am developing a test harness, which parses user scripts and generates a Python script which it then ru

相关标签:
6条回答
  • 2020-12-29 07:52

    You could use exceptions handling and catch actually NameError and SyntaxError. Test it inside try/except block and inform user if there is some invalid input.

    0 讨论(0)
  • 2020-12-29 07:54

    In Python 3, as above, you can simply use str.isidentifier. But in Python 2, this does not exist.

    The tokenize module has a regex for names (identifiers): tokenize.Name. But I couldn't find any documentation for it, so it may not be available everywhere. It is simply r'[a-zA-Z_]\w*'. A single $ after it will let you test strings with re.match.

    The docs say that an identifier is defined by this grammar:

    identifier ::=  (letter|"_") (letter | digit | "_")*
    letter     ::=  lowercase | uppercase
    lowercase  ::=  "a"..."z"
    uppercase  ::=  "A"..."Z"
    digit      ::=  "0"..."9"
    

    Which is equivalent to the regex above. But we should still import tokenize.Name in case this ever changes. (Which is very unlikely, but maybe in older versions of Python it was different?)

    And to filter out keywords, like pass, def and return, use keyword.iskeyword. There is one caveat: None is not a keyword in Python 2, but still can't be assigned to. (keyword.iskeyword('None') in Python 2 is False).

    So:

    import keyword
    
    if hasattr(str, 'isidentifier'):
        _isidentifier = str.isidentifier
    else:
        import re
        _fallback_pattern = '[a-zA-Z_][a-zA-Z0-9_]*'
        try:
            import tokenize
        except ImportError:
            _isidentifier = re.compile(_fallback_pattern + '$').match
        else:
            _isidentifier = re.compile(
                getattr(tokenize, 'Name', _fallback_pattern) + '$'
            ).match
    
        del _fallback_pattern
    
    
    def isname(s):
        return bool(_isidentifier(s)) and not keyword.iskeyword(s) and s != 'None'
    
    0 讨论(0)
  • 2020-12-29 08:00

    You can just let Python (works on any version in use today, as far as I know) do the check for you they way it normally would internally, and catch the exception:

    def _dummy_function_taking_kwargs(**_):
        pass
    
    try:
        _dummy_function_taking_kwargs(**{my_variable: None})
        # if the above line didn't raise and we get here,
        # the keyword/variable name was valid.
        # You could also replace the external dummy function
        # with an inline lambda function.
    except TypeError:
        # If we get here, it wasn't.
    

    Notably, TypeError is consistently raised whenever a dict undergoes keyword argument expansion and has a key which isn't a valid function argument, and whenever a dict literal is being constructed with an invalid key.

    The advantage over the accepted answer is that it is both compatible across both Python 3 and 2, and not as fragile as the ast.parse/compile approach (which would count strings like foo = bar; qux as valid).

    I haven't thoroughly audited this solution or written Hypothesis tests for it to fuzz it, so there might be some corner case, but it seems to generally work on Python 3.7, 3.6, 2.7, and 2.5 (not that anyone ought to be using 2.5 nowadays, but it's still out in the wild and you might be one of the few poor sods stuck having to write code that works with 2.6/2.5).

    0 讨论(0)
  • 2020-12-29 08:04

    I don't think you need the exact same naming syntax as python itself. Would rather go for a simple regexp like:

    \w+
    

    to make sure it's something alphanumeric, and then add a prefix to keep away from python's own syntax. So the non-techie user's declaration:

    LET return = 12
    

    should probably become after your parsing:

    userspace_return = 12
    or
    userspace['return'] = 12
    
    0 讨论(0)
  • 2020-12-29 08:10

    You could try a test assignment and see if it raises a SyntaxError:

    >>> 2fg = 5
      File "<stdin>", line 1
        2fg = 5
          ^
    SyntaxError: invalid syntax
    
    0 讨论(0)
  • 2020-12-29 08:11

    In Python 3 you can use str.isidentifier() to test whether a given string is a valid Python identifier/name.

    >>> 'X'.isidentifier()
    True
    >>> 'X123'.isidentifier()
    True
    >>> '2'.isidentifier()
    False
    >>> 'while'.isidentifier()
    True
    

    The last example shows that you should also check whether the variable name clashes with a Python keyword:

    >>> from keyword import iskeyword
    >>> iskeyword('X')
    False
    >>> iskeyword('while')
    True
    

    So you could put that together in a function:

    from keyword import iskeyword
    
    def is_valid_variable_name(name):
        return name.isidentifier() and not iskeyword(name)
    

    Another option, which works in Python 2 and 3, is to use the ast module:

    from ast import parse
    
    def is_valid_variable_name(name):
        try:
            parse('{} = None'.format(name))
            return True
        except SyntaxError, ValueError, TypeError:
            return False
    
    >>> is_valid_variable_name('X')
    True
    >>> is_valid_variable_name('123')
    False
    >>> is_valid_variable_name('for')
    False
    >>> is_valid_variable_name('')
    False
    >>> is_valid_variable_name(42)
    False
    

    This will parse the assignment statement without actually executing it. It will pick up invalid identifiers as well as attempts to assign to a keyword. In the above code None is an arbitrary value to assign to the given name - it could be any valid expression for the RHS.

    0 讨论(0)
提交回复
热议问题