Loading Large Twitter JSON Data (7GB+) into Python

不羁岁月 提交于 2019-12-01 10:46:30

I'm a VERY new user, but I might be able to offer a partial solution. I believe your formatting is off. You can't just import it as JSON without it being in JSON format. You should be able to fix this if you can get the tweets into a data frame (or separate data frames) and then use the "DataFrame.to_json" command. You WILL need Pandas if not already installed.

Pandas - http://pandas.pydata.org/pandas-docs/stable/10min.html

Dataframe - http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html

Instead of having the entire file as a JSON object, put one JSON object per line for large datasets!

To fix the formatting, you should

  1. Remove the [ at the start of the file
  2. Remove the ] at the end of the file
  3. Remove the comma at the end of each line

Then you can read the file as so:

with open('one_json_per_line.txt', 'r') as infile:
    for line in infile:
        data_row = json.loads(line)

I would suggest using a different storage if possible. SQLite comes to mind.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!