Capture repeated groups in python regex

后端 未结 2 408
不知归路
不知归路 2021-01-22 03:13

I have a mail log file, which is like this:

Aug 15 00:01:06 **** sm-mta*** to=,

        
相关标签:
2条回答
  • 2021-01-22 03:41

    If you cannot use PyPi regex library, you will have to do that in two steps: 1) grab the lines with sm-mta and 2) grab the values you need, with something like

    import re

    txt="""Aug 15 00:01:06 **** sm-mta*** to=<user1@gmail.com>,<user2@yahoo.com>,user3@aol.com, some_more_stuff
    Aug 16 13:16:09 **** sendmail*** to=<user4@yahoo.com>, some_more_stuff
    Aug 17 11:14:48 **** sm-mta*** to=<user5@gmail.com>,<user6@gmail.com>, some_more_stuff"""
    rx = r'@([^\s>,]+)'
    filtered_lines = [x for x in txt.split('\n') if 'sm-mta' in x]
    print(re.findall(rx, " ".join(filtered_lines)))
    

    See the Python demo online. The @([^\s>,]+) pattern will match @ and will capture and return any 1+ chars other than whitespace, > and ,.

    If you can use PyPi regex library, you may get the list of the strings you need with

    >>> import regex
    >>> x="""Aug 15 00:01:06 **** sm-mta*** to=<user1@gmail.com>,<user2@yahoo.com>,user3@aol.com, some_more_stuff
    Aug 16 13:16:09 **** sendmail*** to=<user4@yahoo.com>, some_more_stuff
    Aug 17 11:14:48 **** sm-mta*** to=<user5@gmail.com>,<user6@gmail.com>, some_more_stuff"""
    >>> rx = r'(?:^(?=.*sm-mta)|\G(?!^)).*?@\K[^\s>,]+'
    >>> print(regex.findall(rx, x, regex.M))
    ['gmail.com', 'yahoo.com', 'aol.com,', 'gmail.com', 'gmail.com']
    

    See the Python online demo and a regex demo.

    Pattern details

    • (?:^(?=.*sm-mta)|\G(?!^)) - a line that has sm-mta substring after any 0+ chars other than line break chars, or the place where the previous match ended
    • .*?@ - any 0+ chars other than line break chars, as few as possible, up to the @ and a @ itself
    • \K - a match reset operator that discards all the text matched so far in the current iteration
    • [^\s>,]+ - 1 or more chars other than whitespace, , and >
    0 讨论(0)
  • 2021-01-22 03:57

    Try regex module.

    x="""Aug 15 00:01:06 **** sm-mta*** to=<user1@gmail.com>,<user2@yahoo.com>,user3@aol.com, some_more_stuff
    Aug 16 13:16:09 **** sendmail*** to=<user4@yahoo.com>, some_more_stuff
    Aug 17 11:14:48 **** sm-mta*** to=<user5@gmail.com>,<user6@gmail.com>, some_more_stuff"""
    import regex
    print regex.findall(r"sm-mta.*to=\K|\G(?!^).+?@(.*?)[>, ]", x, version=regex.V1)
    

    Output: ['', 'gmail.com', 'yahoo.com', 'aol.com', '', 'gmail.com', 'gmail.com']

    Just ignore the first empty match.

    https://regex101.com/r/7zPc6j/1

    0 讨论(0)
提交回复
热议问题