I\'ve set up a public stream via AWS to collect tweets and now want to do some preliminary analysis. All my data was stored on an S3 bucket (in 5mb files).
I downlo
Instead of having the entire file as a JSON object, put one JSON object per line for large datasets!
To fix the formatting, you should
[
at the start of the file]
at the end of the fileThen you can read the file as so:
with open('one_json_per_line.txt', 'r') as infile:
for line in infile:
data_row = json.loads(line)
I would suggest using a different storage if possible. SQLite comes to mind.
I'm a VERY new user, but I might be able to offer a partial solution. I believe your formatting is off. You can't just import it as JSON without it being in JSON format. You should be able to fix this if you can get the tweets into a data frame (or separate data frames) and then use the "DataFrame.to_json" command. You WILL need Pandas if not already installed.
Pandas - http://pandas.pydata.org/pandas-docs/stable/10min.html
Dataframe - http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html