Extract json values using just regex

前端 未结 2 1274
醉梦人生
醉梦人生 2020-12-11 12:19

I have a description field that is embedded within json and I\'m unable to utilize json libraries to parse this data.

I use {0,23} in order in attempt

相关标签:
2条回答
  • 2020-12-11 12:46

    You could try this code out:

    import re
    
    description = "description\" : \"this is a tesdt \n another test\" "
    
    result = re.findall(r'(?<=description")(?:\s*\:\s*)(".{0,23}?(?=")")', description, re.IGNORECASE+re.DOTALL)[0]
    
    print(result)
    

    Which gives you the result of:

    "this is a tesdt 
     another test"
    

    Which is essentially:

    \"this is a tesdt \n another test\"
    

    And is what you have asked for in the comments.


    Explanation -

    (?<=description") is a positive look-behind that tells the regex to match the text preceded by description"
    (?:\s*\:\s*) is a non-capturing group that tells the regex that description" will be followed by zero-or-more spaces, a colon (:) and again zero-or-more spaces.
    (".{0,23}?(?=")") is the actual match desired, which consists of a double-quotes ("), zero-to-twenty three characters, and a double-quotes (") at the end.

    0 讨论(0)
  • 2020-12-11 12:48
    # First just creating some test JSON
    
    import json
    
    data = {
        'items': [
            {
                'description': 'A "good" thing',
    
                # This is ignored because I'm assuming we only want the exact key 'description'
                'full_description': 'Not a good thing'
            },
            {
                'description': 'Test some slashes: \\ \\\\ \" // \/ \n\r',
            },
        ]
    }
    
    j = json.dumps(data)
    
    print(j)
    
    # The actual code
    
    import re
    
    pattern = r'"description"\s*:\s*("(?:\\"|[^"])*?")'
    descriptions = [
    
        # I'm using json.loads just to parse the matched string to interpret
        # escapes properly. If this is not acceptable then ast.literal_eval
        # will probably also work
        json.loads(d)
        for d in re.findall(pattern, j)]
    
    # Testing that it works
    
    assert descriptions == [item['description'] for item in data['items']]
    
    0 讨论(0)
提交回复
热议问题