I have a file where on each line I have text like this (representing cast of a film):
[{\'cast_id\': 23, \'character\': \"Roger \'Verbal\' Kint\", \'credit_i
First of all, the line you gave as example is not parsable! … 'Edie's Finneran' …
contains a syntax error, not matter what.
Assuming that you have control over the input, you could simply use eval()
to read in the file. (Although, in that case one would wonder why you can't produce valid JSON in the first place…)
>>> f = open('list.txt', 'r')
>>> s = f.read().strip()
>>> l = eval(s)
>>> import pprint
>>> pprint.pprint(l)
[{'cast_id': 23,
'character': "Roger 'Verbal' Kint",
...
'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]
>>> import json
>>> json.dumps(l)
'[{"cast_id": 23, "character": "Roger \'Verbal\' Kint", "credit_id": "52fe4260ca36847f8019af7", "gender": 2, "id": 1979, "name": "Kevin Spacey", "order": 5, "rofile_path": "/x7wF050iuCASefLLG75s2uDPFUu.jpg"}, {"cast_id": 27, "character":"Edie\'s Finneran", "credit_id": "52fe4260c3a36847f8019b07", "gender": 1, "id":2179, "name": "Suzy Amis", "order": 6, "profile_path": "/b1pjkncyLuBtMUmqD1MztDSG80.jpg"}]'
If you don't have control over the input, this is very dangerous, as it opens you up to code injection attacks.
I cannot emphasize enough that the best solution would be to produce valid JSON in the first place.
import ast
def getJson(filepath):
fr = open(filepath, 'r')
lines = []
for line in fr.readlines():
line_split = line.split(",")
set_line_split = []
for i in line_split:
i_split = i.split(":")
i_set_split = []
for split_i in i_split:
set_split_i = ""
rev = ""
i = 0
for ch in split_i:
if ch in ['\"','\'']:
set_split_i += ch
i += 1
break
else:
set_split_i += ch
i += 1
i_rev = (split_i[i:])[::-1]
state = False
for ch in i_rev:
if ch in ['\"','\''] and state == False:
rev += ch
state = True
elif ch in ['\"','\''] and state == True:
rev += ch+"\\"
else:
rev += ch
i_rev = rev[::-1]
set_split_i += i_rev
i_set_split.append(set_split_i)
set_line_split.append(":".join(i_set_split))
line_modified = ",".join(set_line_split)
lines.append(ast.literal_eval(str(line_modified)))
return lines
lines = getJson('test.txt')
for i in lines:
print(i)
Apart from eval() (mentioned in user3850's answer), you can use ast.literal_eval
This has been discussed in the thread: Using python's eval() vs. ast.literal_eval()?
You can also look at the following discussion threads from Kaggle competition which has data similar to the one mentioned by OP:
https://www.kaggle.com/c/tmdb-box-office-prediction/discussion/89313#latest-517927 https://www.kaggle.com/c/tmdb-box-office-prediction/discussion/80045#latest-518338