Regex Expression For a String

前端 未结 3 593
余生分开走
余生分开走 2021-01-23 10:22

I want to split the string in python.

Sample string:

Hi this is ACT I. SCENE 1 and SCENE 2 and this is ACT II. SCENE 1 and SCENE 2 and more

3条回答
  •  [愿得一人]
    2021-01-23 10:56

    If I understand your requirements correctly, you may use the following pattern:

    (?:ACT|SCENE).+?\d+|\S.*?(?=\s?(?:ACT|SCENE|$))
    

    Demo.

    Breakdown:

    (?:                    # Start of a non-capturing group.
        ACT|SCENE          # Matches either 'ACT' or 'SCENE'.
    )                      # Close the non-capturing group.
    .+?                    # Matches one or more characters (lazy matching).
    \d+                    # Matches one or more digits.
    |                      # Alternation (OR).
    \S                     # Matches a non-whitespace character (to trim spaces).
    .*?                    # Matches zero or more characters (lazy matching).
    (?=                    # Start of a positive Lookahead (i.e., followed by...).
        \s?                # An optional whitespace character (to trim spaces).
        (?:ACT|SCENE|$)    # Followed by either 'ACT' or 'SCENE' or the end of the string.
    )                      # Close the Lookahead.
    

    Python example:

    import re
    
    regex = r"(?:ACT|SCENE).+?\d+|\S.*?(?=\s?(?:ACT|SCENE|$))"
    test_str = "Hi this is ACT I. SCENE 1 and SCENE 2 and this is ACT II. SCENE 1 and SCENE 2 and more"
    
    list = re.findall(regex, test_str)
    print(list)
    

    Output:

    ['Hi this is', 'ACT I. SCENE 1', 'and', 'SCENE 2', 'and this is', 'ACT II. SCENE 1', 'and', 'SCENE 2', 'and more']
    

    Try it online.

提交回复
热议问题