问题
I have an existing Python application, which logs like:
import logging
import json
logger = logging.getLogger()
some_var = 'abc'
data = {
1: 2,
'blah': {
['hello']
}
}
logger.info(f"The value of some_var is {some_var} and data is {json.dumps(data)}")
So the logger.info
function is given:
The value of some_var is abc and data is {1: 2,"blah": {["hello"]}}
Currently my logs go to AWS CloudWatch, which does some magic and renders this with indentation like:
The value of some_var is abc and data is {
1: 2,
"blah": {
["hello"]
}
}
This makes the logs super clear to read.
Now I want to make some changes to my logging, handling it myself with another python script that wraps around my code and emails out logs when there's a failure.
What I want is some way of taking each log entry (or a stream/list of entries), and applying this indentation.
So I want a function which takes in a string, and detects which subset(s) of that string are json, then inserts \n
and to pretty-print that json.
example input:
Hello, {"a": {"b": "c"}} is some json data, but also {"c": [1,2,3]} is too
example output
Hello,
{
"a": {
"b": "c"
}
}
is some json data, but also
{
"c": [
1,
2,
3
]
}
is too
I have considered splitting up each entry into everything before and after the first {
. Leave the left half as is, and pass the right half to json.dumps(json.loads(x), indent=4)
.
But what if there's stuff after the json object in the log file?
Ok, we can just select everything after the first {
and before the last }
.
Then pass the middle bit to the JSON library.
But what if there's two JSON objects in this log entry? (Like in the above example.) We'll have to use a stack to figure out whether any {
appears after all prior {
have been closed with a corresponding }
.
But what if there's something like {"a": "\}"}
. Hmm, ok we need to handle escaping.
Now I find myself having to write a whole json parser from scratch.
Is there any easy way to do this?
I suppose I could use a regex to replace every instance of json.dumps(x)
in my whole repo with json.dumps(x, indent=4)
. But json.dumps
is sometimes used outside logging statements, and it just makes all my logging lines that extra bit longer. Is there a neat elegant solution?
(Bonus points if it can parse and indent the json-like output that str(x)
produces in python. That's basically json with single quotes instead of double.)
回答1:
In order to extract JSON objects from a string, see this answer. The extract_json_objects()
function from that answer will handle JSON objects, and nested JSON objects but nothing else. If you have a list in your log outside of a JSON object, it's not going to be picked up.
In your case, modify the function to also return the strings/text around all the JSON objects, so that you can put them all into the log together (or replace the logline):
from json import JSONDecoder
def extract_json_objects(text, decoder=JSONDecoder()):
pos = 0
while True:
match = text.find('{', pos)
if match == -1:
yield text[pos:] # return the remaining text
break
yield text[pos:match] # modification for the non-JSON parts
try:
result, index = decoder.raw_decode(text[match:])
yield result
pos = match + index
except ValueError:
pos = match + 1
Use that function to process your loglines, add them to a list of strings, which you then join together to produce a single string for your output, logger, etc.:
def jsonify_logline(line):
line_parts = []
for result in extract_json_objects(line):
if isinstance(result, dict): # got a JSON obj
line_parts.append(json.dumps(result, indent=4))
else: # got text/non-JSON-obj
line_parts.append(result)
# (don't make that a list comprehension, quite un-readable)
return ''.join(line_parts)
Example:
>>> demo_text = """Hello, {"a": {"b": "c"}} is some json data, but also {"c": [1,2,3]} is too"""
>>> print(jsonify_logline(demo_text))
Hello, {
"a": {
"b": "c"
}
} is some json data, but also {
"c": [
1,
2,
3
]
} is too
>>>
Other things not directly related which would have helped:
- Instead of using
json.dumps(x)
for all your log lines, following the DRY principle and create a function likelogdump(x)
which does whatever you'd want to do, likejson.dumps(x)
, orjson.dumps(x, indent=4)
, orjsonify_logline(x)
. That way, if you needed to change the JSON format for all your logs, you just change that one function; no need for mass "search & replace", which comes with its own issues and edge-cases.- You can even add an optional parameter to it
pretty=True
to decide if you want it indented or not.
- You can even add an optional parameter to it
- You could mass search & replace all your existing loglines to do
logger.blah(jsonify_logline(<previous log f-string or text>))
- If you are JSON-dumping custom objects/class instances, then use their __str__ method to always output pretty-printed JSON. And the __repr__ to be non-pretty/compact.
- Then you wouldn't need to modify the logline at all. Doing
logger.info(f'here is my object {x}')
would directly invokeobj.__str__
.
- Then you wouldn't need to modify the logline at all. Doing
来源:https://stackoverflow.com/questions/61380028/how-to-detect-and-indent-json-substrings-inside-longer-non-json-text