How to split a string into command line arguments like the shell in python?

前端 未结 3 1365
情话喂你
情话喂你 2020-12-20 15:21

I have command line arguments in a string and I need to split it to feed to argparse.ArgumentParser.parse_args.

I see that the documentation uses

相关标签:
3条回答
  • 2020-12-20 15:30

    This is what shlex.split was created for.

    0 讨论(0)
  • 2020-12-20 15:49

    You could use the split_arg_string helper function from the click package:

    import re
    
    def split_arg_string(string):
        """Given an argument string this attempts to split it into small parts."""
        rv = []
        for match in re.finditer(r"('([^'\\]*(?:\\.[^'\\]*)*)'"
                                 r'|"([^"\\]*(?:\\.[^"\\]*)*)"'
                                 r'|\S+)\s*', string, re.S):
            arg = match.group().strip()
            if arg[:1] == arg[-1:] and arg[:1] in '"\'':
                arg = arg[1:-1].encode('ascii', 'backslashreplace') \
                    .decode('unicode-escape')
            try:
                arg = type(string)(arg)
            except UnicodeError:
                pass
            rv.append(arg)
        return rv
    

    For example:

    >>> print split_arg_string('"this is a test" 1 2 "1 \\" 2"')
    ['this is a test', '1', '2', '1 " 2']
    

    The click package is starting to dominate for command-arguments parsing, but I don't think it supports parsing arguments from string (only from argv). The helper function above is used only for bash completion.

    Edit: I can nothing but recommend to use the shlex.split() as suggested in the answer by @ShadowRanger. The only reason I'm not deleting this answer is because it provides a little bit faster splitting then the full-blown pure-python tokenizer used in shlex (around 3.5x faster for the example above, 5.9us vs 20.5us). However, this shouldn't be a reason to prefer it over shlex.

    0 讨论(0)
  • 2020-12-20 15:55

    If you're parsing a windows-style command line, then shlex.split doesn't work correctly - calling subprocess functions on the result will not have the same behavior as passing the string directly to the shell.

    In that case, the most reliable way to split a string like the command-line arguments to python is... to pass command line arguments to python:

    import sys
    import subprocess
    import shlex
    import json  # json is an easy way to send arbitrary ascii-safe lists of strings out of python
    
    def shell_split(cmd):
        """
        Like `shlex.split`, but uses the Windows splitting syntax when run on Windows.
    
        On windows, this is the inverse of subprocess.list2cmdline
        """
        if os.name == 'posix':
            return shlex.split(cmd)
        else:
            # TODO: write a version of this that doesn't invoke a subprocess
            if not cmd:
                return []
            full_cmd = '{} {}'.format(
                subprocess.list2cmdline([
                    sys.executable, '-c',
                    'import sys, json; print(json.dumps(sys.argv[1:]))'
                ]), cmd
            )
            ret = subprocess.check_output(full_cmd).decode()
            return json.loads(ret)
    

    One example of how these differ:

    # windows does not treat all backslashes as escapes
    >>> shell_split(r'C:\Users\me\some_file.txt "file with spaces"', 'file with spaces')
    ['C:\\Users\\me\\some_file.txt', 'file with spaces']
    
    # posix does
    >>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"')
    ['C:Usersmesome_file.txt', 'file with spaces']
    
    # non-posix does not mean Windows - this produces extra quotes
    >>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"', posix=False)
    ['C:\\Users\\me\\some_file.txt', '"file with spaces"']  
    
    0 讨论(0)
提交回复
热议问题