restrict 1 word as case sensitive and other as case insensitive in python regex | (pipe)

后端 未结 2 1880
猫巷女王i
猫巷女王i 2020-12-01 22:21

I got the meaning of | (pipe special character) in regex, Python. It matches either 1st or 2nd.

ex : a|b Matches either a or b.

<

相关标签:
2条回答
  • 2020-12-01 22:37

    In Python 3.6 and later, you may use the inline modifier groups:

    >>> s = "Welcome to PuNe, Maharashtra"
    >>> print(re.findall(r"PuNe|(?i:MaHaRaShTrA)",s))
    ['PuNe', 'Maharashtra']
    

    See the relevant Python re documentation:

    (?aiLmsux-imsx:...)
       (Zero or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x', optionally followed by '-' followed by one or more letters from the 'i', 'm', 's', 'x'.) The letters set or remove the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode matching), and re.X (verbose), for the part of the expression. (The flags are described in Module Contents.)

    The letters 'a', 'L' and 'u' are mutually exclusive when used as inline flags, so they can’t be combined or follow '-'. Instead, when one of them appears in an inline group, it overrides the matching mode in the enclosing group. In Unicode patterns (?a:...) switches to ASCII-only matching, and (?u:...) switches to Unicode matching (default). In byte pattern (?L:...) switches to locale depending matching, and (?a:...) switches to ASCII-only matching (default). This override is only in effect for the narrow inline group, and the original matching mode is restored outside of the group.

    New in version 3.6.

    Changed in version 3.7: The letters 'a', 'L' and 'u' also can be used in a group.

    Unfortunately, Python re versions before 3.6 did not support these groups, nor did they support alternating on and off inline modifiers.

    If you can use PyPi regex module, you may use a (?i:...) construct:

    import regex
    s = "Welcome to PuNe, Maharashtra"
    print(regex.findall(r"PuNe|(?i:MaHaRaShTrA)",s))
    

    See the online Python demo.

    0 讨论(0)
  • 2020-12-01 22:49

    You could generate the lower/upper case regex for the second word, and keep casing active:

    my_regex = "PuNe|"+"".join("[{}{}]".format(x.upper(),x.lower()) for x in "MaHaRaShTrA")
    

    that generates: PuNe|[Mm][Aa][Hh][Aa][Rr][Aa][Ss][Hh][Tt][Rr][Aa]

    and re.search(my_regex,s1) without any option does what you want.

    0 讨论(0)
提交回复
热议问题