Replacing repeated captures

后端 未结 4 1887
悲哀的现实
悲哀的现实 2021-02-18 13:11

This is sort of a follow-up to Python regex - Replace single quotes and brackets thread.

The task:

Sample input strings:

RSQ(nam         


        
相关标签:
4条回答
  • 2021-02-18 13:19

    You can indeed use the regex module and repeated captures. The main interest is that you can check the structure of the matched string:

    import regex
    
    regO = regex.compile(r'''
        \w+ \( (?: name\['([^']*)'] (?: ,[ ] | (?=\)) ) )* \)
        ''', regex.VERBOSE);
    
    regO.sub(lambda m: 'XYZ(' + (', '.join(m.captures(1))) + ')', s)
    

    (Note that you can replace "name" by \w+ or anything you want without problems.)

    0 讨论(0)
  • 2021-02-18 13:19

    You could do this. Though I don't think it's very readable. And doing it this way could get unruly if you start adding more patterns to replace. It takes advantage of the fact that the replacement string can also be a function.

    s = "RSQ(name['BAKD DK'], name['A DKJ'])"
    re.sub(r"^(\w+)|name\['(.*?)'\]", lambda m: 'XYZ' if m.group(1) else m.group(2), s)
    
    0 讨论(0)
  • 2021-02-18 13:32

    Please do not do this in any code I have to maintain.

    You are trying to parse syntactically valid Python. Use ast for that. It's more readable, easier to extend to new syntax, and won't fall apart on some weird corner case.

    Working sample:

    from ast import parse
    
    l = [
        "RSQ(name['BAKD DK'], name['A DKJ'])",
        "SMT(name['BAKD DK'], name['A DKJ'], name['S QRT'])"
    ]
    
    for item in l:
        tree = parse(item)
        args = [arg.slice.value.s for arg in tree.body[0].value.args]
    
        output = "XYZ({})".format(", ".join(args))
        print(output)
    

    Prints:

    XYZ(BAKD DK, A DKJ)
    XYZ(BAKD DK, A DKJ, S QRT)
    
    0 讨论(0)
  • 2021-02-18 13:37

    You can use re.findall() and a simple string formatting:

    >>> s = "SMT(name['BAKD DK'], name['A DKJ'], name['S QRT'])"
    >>> 
    >>> 'XYZ({})'.format(','.join(re.findall(r"'([^']+)'", s)))
    'XYZ(BAKD DK,A DKJ,S QRT)'
    
    0 讨论(0)
提交回复
热议问题