Python how convert single quotes to double quotes to format as json string

后端 未结 3 1271
粉色の甜心
粉色の甜心 2021-01-18 11:13

I have a file where on each line I have text like this (representing cast of a film):

[{\'cast_id\': 23, \'character\': \"Roger \'Verbal\' Kint\", \'credit_i         


        
相关标签:
3条回答
  • 2021-01-18 11:48

    First of all, the line you gave as example is not parsable! … 'Edie's Finneran' … contains a syntax error, not matter what.

    Assuming that you have control over the input, you could simply use eval() to read in the file. (Although, in that case one would wonder why you can't produce valid JSON in the first place…)

    >>> f = open('list.txt', 'r')
    >>> s = f.read().strip()
    >>> l = eval(s)
    
    >>> import pprint
    >>> pprint.pprint(l)
    [{'cast_id': 23,
      'character': "Roger 'Verbal' Kint",
      ...
      'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]
    
    >>> import json
    >>> json.dumps(l)
    '[{"cast_id": 23, "character": "Roger \'Verbal\' Kint", "credit_id": "52fe4260ca36847f8019af7", "gender": 2, "id": 1979, "name": "Kevin Spacey", "order": 5, "rofile_path": "/x7wF050iuCASefLLG75s2uDPFUu.jpg"}, {"cast_id": 27, "character":"Edie\'s Finneran", "credit_id": "52fe4260c3a36847f8019b07", "gender": 1, "id":2179, "name": "Suzy Amis", "order": 6, "profile_path": "/b1pjkncyLuBtMUmqD1MztDSG80.jpg"}]'
    

    If you don't have control over the input, this is very dangerous, as it opens you up to code injection attacks.

    I cannot emphasize enough that the best solution would be to produce valid JSON in the first place.

    0 讨论(0)
  • 2021-01-18 11:57

    Here is the code to get desired output

    import ast
    def getJson(filepath):
        fr = open(filepath, 'r')
        lines = []
        for line in fr.readlines():
            line_split = line.split(",")
            set_line_split = []
            for i in line_split:
                i_split = i.split(":")
                i_set_split = []
                for split_i in i_split:
                    set_split_i = ""
                    rev = ""
                    i = 0
                    for ch in split_i:
                        if ch in ['\"','\'']:
                            set_split_i += ch
                            i += 1
                            break
                        else:
                            set_split_i += ch
                            i += 1
                    i_rev = (split_i[i:])[::-1]
                    state = False
                    for ch in i_rev:
                        if ch in ['\"','\''] and state == False:
                            rev += ch
                            state = True
                        elif ch in ['\"','\''] and state == True:
                            rev += ch+"\\"
                        else:
                            rev += ch
                    i_rev = rev[::-1]
                    set_split_i += i_rev
                    i_set_split.append(set_split_i)
                set_line_split.append(":".join(i_set_split))
            line_modified = ",".join(set_line_split)
            lines.append(ast.literal_eval(str(line_modified)))
        return lines
    lines = getJson('test.txt')
    for i in lines:
        print(i)
    
    0 讨论(0)
  • 2021-01-18 11:58

    Apart from eval() (mentioned in user3850's answer), you can use ast.literal_eval

    This has been discussed in the thread: Using python's eval() vs. ast.literal_eval()?

    You can also look at the following discussion threads from Kaggle competition which has data similar to the one mentioned by OP:

    https://www.kaggle.com/c/tmdb-box-office-prediction/discussion/89313#latest-517927 https://www.kaggle.com/c/tmdb-box-office-prediction/discussion/80045#latest-518338

    0 讨论(0)
提交回复
热议问题