问题
Log file:
INFO:werkzeug:127.0.0.1 - - [20/Sep/2018 19:40:00] "GET /socket.io/?polling HTTP/1.1" 200 -
INFO:engineio: Received packet MESSAGE, ["key",{"data":{"tag1":12,"tag2":13,"tag3": 14"...}}]
I'm interested in extracting only the text from with in the brackets which contain the keyword "key"
and not all of the occurrences that match the regex pattern from below.
Here is what I have tried so far:
import re
with open('logfile.log', 'r') as text_file:
matches = re.findall(r'\[([^\]]+)', text_file.read())
with open('output.txt', 'w') as out:
out.write('\n'.join(matches))
This outputs all of the occurrences that match the regex. The desired output to the output.txt would look like this:
"key",{"data":{"tag1":12,"tag2":13,"tag3": 14"...}}
回答1:
To match text within square brackets that cannot have [
and ]
inside it, but should contain some other text can be matched with a [^][]
negated character class.
That is, you may match the whole text within square brackets with \[[^][]*]
, and if you need to match some text inside, you need to put that text after [^][]*
and then append another occurrence of [^][]*
before the closing ]
.
You may use
re.findall(r'\[([^][]*"key"[^][]*)]', text_file.read())
See the Python demo:
import re
s = '''INFO:werkzeug:127.0.0.1 - - [20/Sep/2018 19:40:00] "GET /socket.io/?polling HTTP/1.1" 200 -
INFO:engineio: Received packet MESSAGE, ["key",{"data":{"tag1":12,"tag2":13,"tag3": 14"...}}]'''
print(re.findall(r'\[([^][]*"key"[^][]*)]', s))
Output:
['"key",{"data":{"tag1":12,"tag2":13,"tag3": 14"...}}']
来源:https://stackoverflow.com/questions/52447842/extract-occurrence-of-text-between-brackets-from-a-text-file-python