Python, how to parse strings to look like sys.argv

最后都变了- 提交于 2019-11-28 18:11:38

I believe you want the shlex module.

>>> import shlex
>>> shlex.split('-o 1 --long "Some long string"')
['-o', '1', '--long', 'Some long string']

Before I was aware of shlex.split, I made the following:

import sys

_WORD_DIVIDERS = set((' ', '\t', '\r', '\n'))

_QUOTE_CHARS_DICT = {
    '\\':   '\\',
    ' ':    ' ',
    '"':    '"',
    'r':    '\r',
    'n':    '\n',
    't':    '\t',
}

def _raise_type_error():
    raise TypeError("Bytes must be decoded to Unicode first")

def parse_to_argv_gen(instring):
    is_in_quotes = False
    instring_iter = iter(instring)
    join_string = instring[0:0]

    c_list = []
    c = ' '
    while True:
        # Skip whitespace
        try:
            while True:
                if not isinstance(c, str) and sys.version_info[0] >= 3:
                    _raise_type_error()
                if c not in _WORD_DIVIDERS:
                    break
                c = next(instring_iter)
        except StopIteration:
            break
        # Read word
        try:
            while True:
                if not isinstance(c, str) and sys.version_info[0] >= 3:
                    _raise_type_error()
                if not is_in_quotes and c in _WORD_DIVIDERS:
                    break
                if c == '"':
                    is_in_quotes = not is_in_quotes
                    c = None
                elif c == '\\':
                    c = next(instring_iter)
                    c = _QUOTE_CHARS_DICT.get(c)
                if c is not None:
                    c_list.append(c)
                c = next(instring_iter)
            yield join_string.join(c_list)
            c_list = []
        except StopIteration:
            yield join_string.join(c_list)
            break

def parse_to_argv(instring):
    return list(parse_to_argv_gen(instring))

This works with Python 2.x and 3.x. On Python 2.x, it works directly with byte strings and Unicode strings. On Python 3.x, it only accepts [Unicode] strings, not bytes objects.

This doesn't behave exactly the same as shell argv splitting—it also allows quoting of CR, LF and TAB characters as \r, \n and \t, converting them to real CR, LF, TAB (shlex.split doesn't do that). So writing my own function was useful for my needs. I guess shlex.split is better if you just want plain shell-style argv splitting. I'm sharing this code in case it's useful as a baseline for doing something slightly different.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!