I have a JSON file I want to convert to a CSV file. How can I do this with Python?
I tried:
import json
import c
Since the data appears to be in a dictionary format, it would appear that you should actually use csv.DictWriter() to actually output the lines with the appropriate header information. This should allow the conversion to be handled somewhat easier. The fieldnames parameter would then set up the order properly while the output of the first line as the headers would allow it to be read and processed later by csv.DictReader().
For example, Mike Repass used
output = csv.writer(sys.stdout)
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow(row.values())
However just change the initial setup to output = csv.DictWriter(filesetting, fieldnames=data[0].keys())
Note that since the order of elements in a dictionary is not defined, you might have to create fieldnames entries explicitly. Once you do that, the writerow will work. The writes then work as originally shown.
With the pandas
library, this is as easy as using two commands!
pandas.read_json()
To convert a JSON string to a pandas object (either a series or dataframe). Then, assuming the results were stored as df
:
df.to_csv()
Which can either return a string or write directly to a csv-file.
Based on the verbosity of previous answers, we should all thank pandas for the shortcut.
It'll be easy to use csv.DictWriter()
,the detailed implementation can be like this:
def read_json(filename):
return json.loads(open(filename).read())
def write_csv(data,filename):
with open(filename, 'w+') as outf:
writer = csv.DictWriter(outf, data[0].keys())
writer.writeheader()
for row in data:
writer.writerow(row)
# implement
write_csv(read_json('test.json'), 'output.csv')
Note that this assumes that all of your JSON objects have the same fields.
Here is the reference which may help you.
I am assuming that your JSON file will decode into a list of dictionaries. First we need a function which will flatten the JSON objects:
def flattenjson( b, delim ):
val = {}
for i in b.keys():
if isinstance( b[i], dict ):
get = flattenjson( b[i], delim )
for j in get.keys():
val[ i + delim + j ] = get[j]
else:
val[i] = b[i]
return val
The result of running this snippet on your JSON object:
flattenjson( {
"pk": 22,
"model": "auth.permission",
"fields": {
"codename": "add_message",
"name": "Can add message",
"content_type": 8
}
}, "__" )
is
{
"pk": 22,
"model": "auth.permission',
"fields__codename": "add_message",
"fields__name": "Can add message",
"fields__content_type": 8
}
After applying this function to each dict in the input array of JSON objects:
input = map( lambda x: flattenjson( x, "__" ), input )
and finding the relevant column names:
columns = [ x for row in input for x in row.keys() ]
columns = list( set( columns ) )
it's not hard to run this through the csv module:
with open( fname, 'wb' ) as out_file:
csv_w = csv.writer( out_file )
csv_w.writerow( columns )
for i_r in input:
csv_w.writerow( map( lambda x: i_r.get( x, "" ), columns ) )
I hope this helps!
Alec's answer is great, but it doesn't work in the case where there are multiple levels of nesting. Here's a modified version that supports multiple levels of nesting. It also makes the header names a bit nicer if the nested object already specifies its own key (e.g. Firebase Analytics / BigTable / BigQuery data):
"""Converts JSON with nested fields into a flattened CSV file.
"""
import sys
import json
import csv
import os
import jsonlines
from orderedset import OrderedSet
# from https://stackoverflow.com/a/28246154/473201
def flattenjson( b, prefix='', delim='/', val=None ):
if val is None:
val = {}
if isinstance( b, dict ):
for j in b.keys():
flattenjson(b[j], prefix + delim + j, delim, val)
elif isinstance( b, list ):
get = b
for j in range(len(get)):
key = str(j)
# If the nested data contains its own key, use that as the header instead.
if isinstance( get[j], dict ):
if 'key' in get[j]:
key = get[j]['key']
flattenjson(get[j], prefix + delim + key, delim, val)
else:
val[prefix] = b
return val
def main(argv):
if len(argv) < 2:
raise Error('Please specify a JSON file to parse')
print "Loading and Flattening..."
filename = argv[1]
allRows = []
fieldnames = OrderedSet()
with jsonlines.open(filename) as reader:
for obj in reader:
# print 'orig:\n'
# print obj
flattened = flattenjson(obj)
#print 'keys: %s' % flattened.keys()
# print 'flattened:\n'
# print flattened
fieldnames.update(flattened.keys())
allRows.append(flattened)
print "Exporting to CSV..."
outfilename = filename + '.csv'
count = 0
with open(outfilename, 'w') as file:
csvwriter = csv.DictWriter(file, fieldnames=fieldnames)
csvwriter.writeheader()
for obj in allRows:
# print 'allRows:\n'
# print obj
csvwriter.writerow(obj)
count += 1
print "Wrote %d rows" % count
if __name__ == '__main__':
main(sys.argv)
It is not a very smart way to do it, but I have had the same problem and this worked for me:
import csv
f = open('data.json')
data = json.load(f)
f.close()
new_data = []
for i in data:
flat = {}
names = i.keys()
for n in names:
try:
if len(i[n].keys()) > 0:
for ii in i[n].keys():
flat[n+"_"+ii] = i[n][ii]
except:
flat[n] = i[n]
new_data.append(flat)
f = open(filename, "r")
writer = csv.DictWriter(f, new_data[0].keys())
writer.writeheader()
for row in new_data:
writer.writerow(row)
f.close()