I've been trying to create Regex for WhatsApp chat log.
So far I've been able to achieve this
Click Here for the test link
By creating the following Regex:
(?P<datetime>\d{2}\/\d{2}\/\d{4},\s\d(?:\d)?:\d{2} [pa].m.)\s-\s(?P<name>[^:]*):(?P<message>.*)
The problem with this regex is, it is not able to match big messages which span multiple lines with line breaks. You can see the issue in the link provided above.
Help would be appreciated.
Thank you.
There you go:
^
(?P<datetime>\d{2}/\d{2}/\d{4}[^-]+)\s+-\s+
(?P<name>[^:]+):\s+
(?P<message>[\s\S]+?)
(?=^\d{2}|\Z)
See your modified demo on regex101.com.
Essentially, I added anchors, simplified your datetime part and inserted a
[\s\S]+?
which means: match anything lazily (including newlines) up to the following condition which is a lookahead. The lookahead makes sure there's either another two digits right after a newline (could be tightened!) or the very end of the string.
The dot does not match newline characters, which is why you only get the first line matched. The matching behaviour of a regular expression engine can usually be modified with flags.
On the regexp101 page, you can click on Set Regex Options (the flag right next to the regular expression input field) and activate Single line, then the dot will also match \n
.
But then you have to modify your expression so that it detects the start of the next message, otherwise everything will be interpreted as one message.
来源:https://stackoverflow.com/questions/50280191/regex-to-match-whatsapp-chat-log