How to split a string by commas positioned outside of parenthesis?

前端未结

关注

 10  2060

I got a string of such format:

\"Wilbur Smith (Billy, son of John), Eddie Murphy (John), Elvis Presley, Jane Doe (Jane Doe)\"

so basicly i

相关标签:

10条回答

Happy的楠姐

2020-11-27 20:13

Here's a general technique I've used in the past for such cases:

Use the sub function of the re module with a function as replacement argument. The function keeps track of opening and closing parens, brackets and braces, as well as single and double quotes, and performs a replacement only outside of such bracketed and quoted substrings. You can then replace the non-bracketed/quoted commas with another character which you're sure doesn't appear in the string (I use the ASCII/Unicode group-separator: chr(29) code), then do a simple string.split on that character. Here's the code:

import re
def srchrepl(srch, repl, string):
    """Replace non-bracketed/quoted occurrences of srch with repl in string"""

    resrchrepl = re.compile(r"""(?P<lbrkt>[([{])|(?P<quote>['"])|(?P<sep>["""
                            + srch + """])|(?P<rbrkt>[)\]}])""")
    return resrchrepl.sub(_subfact(repl), string)

def _subfact(repl):
    """Replacement function factory for regex sub method in srchrepl."""
    level = 0
    qtflags = 0
    def subf(mo):
        nonlocal level, qtflags
        sepfound = mo.group('sep')
        if  sepfound:
            if level == 0 and qtflags == 0:
                return repl
            else:
                return mo.group(0)
        elif mo.group('lbrkt'):
            level += 1
            return mo.group(0)
        elif mo.group('quote') == "'":
            qtflags ^= 1            # toggle bit 1
            return "'"
        elif mo.group('quote') == '"':
            qtflags ^= 2            # toggle bit 2
            return '"'
        elif mo.group('rbrkt'):
            level -= 1
            return mo.group(0)
    return subf

If you don't have nonlocal in your version of Python, just change it to global and define level and qtflags at the module level.

Here's how it's used:

>>> GRPSEP = chr(29)
>>> string = "Wilbur Smith (Billy, son of John), Eddie Murphy (John), Elvis Presley, Jane Doe (Jane Doe)"
>>> lst = srchrepl(',', GRPSEP, string).split(GRPSEP)
>>> lst
['Wilbur Smith (Billy, son of John)', ' Eddie Murphy (John)', ' Elvis Presley', ' Jane Doe (Jane Doe)']

0 讨论(0)

耶瑟儿～

2020-11-27 20:14
```
s = re.split(r',\s*(?=[^)]*(?:\(|$))', x) 
```
The lookahead matches everything up to the next open-parenthesis or to the end of the string, iff there's no close-parenthesis in between. That ensures that the comma is not inside a set of parentheses.
0 讨论(0)
发布评论:

提交评论
- 加载中...
我在风中等你

2020-11-27 20:17

This post helped me a lot. I was looking to split a string by commas positioned outside quotes. I used this as a starter. My final line of code was regEx = re.compile(r'(?:[^,"]|"[^"]*")+') This did the trick. Thanks a ton.

0 讨论(0)
发布评论:

提交评论
- 加载中...

难免孤独

2020-11-27 20:19

An attempt on human-readable regex:

import re

regex = re.compile(r"""
    # name starts and ends on word boundary
    # no '(' or commas in the name
    (?P<name>\b[^(,]+\b)
    \s*
    # everything inside parentheses is a role
    (?:\(
      (?P<role>[^)]+)
    \))? # role is optional
    """, re.VERBOSE)

s = ("Wilbur Smith (Billy, son of John), Eddie Murphy (John), Elvis Presley,"
     "Jane Doe (Jane Doe)")
print re.findall(regex, s)

Output:

[('Wilbur Smith', 'Billy, son of John'), ('Eddie Murphy', 'John'), 
 ('Elvis Presley', ''), ('Jane Doe', 'Jane Doe')]

0 讨论(0)

上一页 1 2