Extract json values using just regex

前端未结

关注

 2  1274

醉梦人生

I have a description field that is embedded within json and I\'m unable to utilize json libraries to parse this data.

I use {0,23} in order in attempt

相关标签:

2条回答

感情败类

2020-12-11 12:46
You could try this code out:
```
import re

description = "description\" : \"this is a tesdt \n another test\" "

result = re.findall(r'(?<=description")(?:\s*\:\s*)(".{0,23}?(?=")")', description, re.IGNORECASE+re.DOTALL)[0]

print(result)
```
Which gives you the result of:
```
"this is a tesdt 
 another test"
```
Which is essentially:
```
\"this is a tesdt \n another test\"
```
And is what you have asked for in the comments.

Explanation -

(?<=description") is a positive look-behind that tells the regex to match the text preceded by description"
(?:\s*\:\s*) is a non-capturing group that tells the regex that description" will be followed by zero-or-more spaces, a colon (:) and again zero-or-more spaces.
(".{0,23}?(?=")") is the actual match desired, which consists of a double-quotes ("), zero-to-twenty three characters, and a double-quotes (") at the end.
0 讨论(0)
发布评论:

提交评论
- 加载中...

长发绾君心

2020-12-11 12:48

# First just creating some test JSON

import json

data = {
    'items': [
        {
            'description': 'A "good" thing',

            # This is ignored because I'm assuming we only want the exact key 'description'
            'full_description': 'Not a good thing'
        },
        {
            'description': 'Test some slashes: \\ \\\\ \" // \/ \n\r',
        },
    ]
}

j = json.dumps(data)

print(j)

# The actual code

import re

pattern = r'"description"\s*:\s*("(?:\\"|[^"])*?")'
descriptions = [

    # I'm using json.loads just to parse the matched string to interpret
    # escapes properly. If this is not acceptable then ast.literal_eval
    # will probably also work
    json.loads(d)
    for d in re.findall(pattern, j)]

# Testing that it works

assert descriptions == [item['description'] for item in data['items']]

0 讨论(0)

Extract json values using just regex

Explanation -