I have a description field that is embedded within json and I\'m unable to utilize json libraries to parse this data.
I use {0,23}
in order in attempt
You could try this code out:
import re
description = "description\" : \"this is a tesdt \n another test\" "
result = re.findall(r'(?<=description")(?:\s*\:\s*)(".{0,23}?(?=")")', description, re.IGNORECASE+re.DOTALL)[0]
print(result)
Which gives you the result of:
"this is a tesdt
another test"
Which is essentially:
\"this is a tesdt \n another test\"
And is what you have asked for in the comments.
(?<=description")
is a positive look-behind that tells the regex to match the text preceded by description"
(?:\s*\:\s*)
is a non-capturing group that tells the regex that description"
will be followed by zero-or-more spaces, a colon (:
) and again zero-or-more spaces.
(".{0,23}?(?=")")
is the actual match desired, which consists of a double-quotes ("
), zero-to-twenty three characters, and a double-quotes ("
) at the end.
# First just creating some test JSON
import json
data = {
'items': [
{
'description': 'A "good" thing',
# This is ignored because I'm assuming we only want the exact key 'description'
'full_description': 'Not a good thing'
},
{
'description': 'Test some slashes: \\ \\\\ \" // \/ \n\r',
},
]
}
j = json.dumps(data)
print(j)
# The actual code
import re
pattern = r'"description"\s*:\s*("(?:\\"|[^"])*?")'
descriptions = [
# I'm using json.loads just to parse the matched string to interpret
# escapes properly. If this is not acceptable then ast.literal_eval
# will probably also work
json.loads(d)
for d in re.findall(pattern, j)]
# Testing that it works
assert descriptions == [item['description'] for item in data['items']]