How to replace dash between characters with space using regex

后端 未结 4 1109
猫巷女王i
猫巷女王i 2021-01-18 11:56

I want to replace dashes which appear between letters with a space using regex. For example to replace ab-cd with ab cd

The following matc

相关标签:
4条回答
  • 2021-01-18 12:28

    Use references to capturing groups:

    >>> original_term = 'ab-cd'
    >>> re.sub(r"([A-z])\-([A-z])", r"\1 \2", original_term)
    'ab cd'
    

    This assumes, of course, that you can't just do original_term.replace('-', ' ') for whatever reason. Perhaps your text uses hyphens where it should use en dashes or something.

    0 讨论(0)
  • 2021-01-18 12:28

    You need to use look-arounds:

     new_term = re.sub(r"(?<=[A-Za-z])-(?=[A-Za-z])", " ", original_term)
    

    Or capturing groups:

     new_term = re.sub(r"([A-Za-z])-(?=[A-Za-z])", r"\1 ", original_term)
    

    See IDEONE demo

    Note that [A-z] also matches some non-letters (namely [, \, ], ^, _, and `), thus, I suggest replacing it with [A-Z] and use a case-insensitive modifier (?i).

    Note that you do not have to escape a hyphen outside a character class.

    0 讨论(0)
  • 2021-01-18 12:40

    You need to capture the characters before and after the - to a group and use them for replacement, i.e.:

    import re
    subject = "ab-cd"
    subject = re.sub(r"([a-z])\-([a-z])", r"\1 \2", subject , 0, re.IGNORECASE)
    print subject
    #ab cd
    

    DEMO

    http://ideone.com/LAYQWT


    REGEX EXPLANATION

    ([A-z])\-([A-z])
    
    Match the regex below and capture its match into backreference number 1 «([A-z])»
       Match a single character in the range between “A” and “z” «[A-z]»
    Match the character “-” literally «\-»
    Match the regex below and capture its match into backreference number 2 «([A-z])»
       Match a single character in the range between “A” and “z” «[A-z]»
    
    \1 \2
    
    Insert the text that was last matched by capturing group number 1 «\1»
    Insert the character “ ” literally « »
    Insert the text that was last matched by capturing group number 2 «\2»
    
    0 讨论(0)
  • 2021-01-18 12:45

    re.sub() always replaces the whole matched sequence with the replacement.

    A solution to only replace the dash are lookahead and lookbehind assertions. They don't count to the matched sequence.

    new_term = re.sub(r"(?<=[A-z])\-(?=[A-z])", " ", original_term)
    

    The syntax is explained in the Python documentation for the re module.

    0 讨论(0)
提交回复
热议问题