Python - Parsing JSON formatted text file with regex

后端 未结 4 685
没有蜡笔的小新
没有蜡笔的小新 2020-12-22 09:34

I have a text file formatted like a JSON file however everything is on a single line (could be a MongoDB File). Could someone please point me in the direction of how I could

4条回答
  •  隐瞒了意图╮
    2020-12-22 09:51

    You can use python's walk method and check each entry with re.match.

    In case that the string you got is not convertable to a python dict, you can use just regex:

    print re.match(r'.*fileAssetId\":\"([^\"]+)\".*', your_pattern).group(1)
    

    Solution for your example:

    import re
    
    example_string = '{"d":{"__type":"WikiFileNodeContent:http:\/\/samplesite.com.u\/ns\/business\/wiki","author":null,"description":null,"fileAssetId":"034b9317-60d9-45c2-b6d6-0f24b59e1991","filename":"Reports.pdf"},"createdBy":1531,"createdByUsername":"John Cash","icon":"\/Assets10.37.5.0\/pix\/16x16\/page_white_acrobat.png","id":3041,"inheritedPermissions":false,"name":"map","permissions":[23,87,35,49,65],"type":3,"viewLevel":2},{"__type":"WikiNode:http:\/\/samplesite.com.au\/ns\/business\/wiki","children":[],"content"'
    
    regex_pattern = r'.*fileAssetId\":\"([^\"]+)\".*'
    match = re.match(regex_pattern, example_string)
    fileAssetId = match.group(1)
    print('fileAssetId: {}'.format(fileAssetId))
    

    executing this yields:

    34b9317‌​-60d9-45c2-b6d6-0f24‌​b59e1991
    

提交回复
热议问题