How can I convert JSON to CSV?

前端 未结 26 1613
余生分开走
余生分开走 2020-11-21 22:32

I have a JSON file I want to convert to a CSV file. How can I do this with Python?

I tried:

import json
import c         


        
相关标签:
26条回答
  • 2020-11-21 22:53

    Since the data appears to be in a dictionary format, it would appear that you should actually use csv.DictWriter() to actually output the lines with the appropriate header information. This should allow the conversion to be handled somewhat easier. The fieldnames parameter would then set up the order properly while the output of the first line as the headers would allow it to be read and processed later by csv.DictReader().

    For example, Mike Repass used

    output = csv.writer(sys.stdout)
    
    output.writerow(data[0].keys())  # header row
    
    for row in data:
      output.writerow(row.values())
    

    However just change the initial setup to output = csv.DictWriter(filesetting, fieldnames=data[0].keys())

    Note that since the order of elements in a dictionary is not defined, you might have to create fieldnames entries explicitly. Once you do that, the writerow will work. The writes then work as originally shown.

    0 讨论(0)
  • 2020-11-21 22:54

    With the pandas library, this is as easy as using two commands!

    pandas.read_json()
    

    To convert a JSON string to a pandas object (either a series or dataframe). Then, assuming the results were stored as df:

    df.to_csv()
    

    Which can either return a string or write directly to a csv-file.

    Based on the verbosity of previous answers, we should all thank pandas for the shortcut.

    0 讨论(0)
  • 2020-11-21 22:58

    It'll be easy to use csv.DictWriter(),the detailed implementation can be like this:

    def read_json(filename):
        return json.loads(open(filename).read())
    def write_csv(data,filename):
        with open(filename, 'w+') as outf:
            writer = csv.DictWriter(outf, data[0].keys())
            writer.writeheader()
            for row in data:
                writer.writerow(row)
    # implement
    write_csv(read_json('test.json'), 'output.csv')
    

    Note that this assumes that all of your JSON objects have the same fields.

    Here is the reference which may help you.

    0 讨论(0)
  • 2020-11-21 22:59

    I am assuming that your JSON file will decode into a list of dictionaries. First we need a function which will flatten the JSON objects:

    def flattenjson( b, delim ):
        val = {}
        for i in b.keys():
            if isinstance( b[i], dict ):
                get = flattenjson( b[i], delim )
                for j in get.keys():
                    val[ i + delim + j ] = get[j]
            else:
                val[i] = b[i]
    
        return val
    

    The result of running this snippet on your JSON object:

    flattenjson( {
        "pk": 22, 
        "model": "auth.permission", 
        "fields": {
          "codename": "add_message", 
          "name": "Can add message", 
          "content_type": 8
        }
      }, "__" )
    

    is

    {
        "pk": 22, 
        "model": "auth.permission', 
        "fields__codename": "add_message", 
        "fields__name": "Can add message", 
        "fields__content_type": 8
    }
    

    After applying this function to each dict in the input array of JSON objects:

    input = map( lambda x: flattenjson( x, "__" ), input )
    

    and finding the relevant column names:

    columns = [ x for row in input for x in row.keys() ]
    columns = list( set( columns ) )
    

    it's not hard to run this through the csv module:

    with open( fname, 'wb' ) as out_file:
        csv_w = csv.writer( out_file )
        csv_w.writerow( columns )
    
        for i_r in input:
            csv_w.writerow( map( lambda x: i_r.get( x, "" ), columns ) )
    

    I hope this helps!

    0 讨论(0)
  • 2020-11-21 22:59

    Alec's answer is great, but it doesn't work in the case where there are multiple levels of nesting. Here's a modified version that supports multiple levels of nesting. It also makes the header names a bit nicer if the nested object already specifies its own key (e.g. Firebase Analytics / BigTable / BigQuery data):

    """Converts JSON with nested fields into a flattened CSV file.
    """
    
    import sys
    import json
    import csv
    import os
    
    import jsonlines
    
    from orderedset import OrderedSet
    
    # from https://stackoverflow.com/a/28246154/473201
    def flattenjson( b, prefix='', delim='/', val=None ):
      if val is None:
        val = {}
    
      if isinstance( b, dict ):
        for j in b.keys():
          flattenjson(b[j], prefix + delim + j, delim, val)
      elif isinstance( b, list ):
        get = b
        for j in range(len(get)):
          key = str(j)
    
          # If the nested data contains its own key, use that as the header instead.
          if isinstance( get[j], dict ):
            if 'key' in get[j]:
              key = get[j]['key']
    
          flattenjson(get[j], prefix + delim + key, delim, val)
      else:
        val[prefix] = b
    
      return val
    
    def main(argv):
      if len(argv) < 2:
        raise Error('Please specify a JSON file to parse')
    
      print "Loading and Flattening..."
      filename = argv[1]
      allRows = []
      fieldnames = OrderedSet()
      with jsonlines.open(filename) as reader:
        for obj in reader:
          # print 'orig:\n'
          # print obj
          flattened = flattenjson(obj)
          #print 'keys: %s' % flattened.keys()
          # print 'flattened:\n'
          # print flattened
          fieldnames.update(flattened.keys())
          allRows.append(flattened)
    
      print "Exporting to CSV..."
      outfilename = filename + '.csv'
      count = 0
      with open(outfilename, 'w') as file:
        csvwriter = csv.DictWriter(file, fieldnames=fieldnames)
        csvwriter.writeheader()
        for obj in allRows:
          # print 'allRows:\n'
          # print obj
          csvwriter.writerow(obj)
          count += 1
    
      print "Wrote %d rows" % count
    
    
    
    if __name__ == '__main__':
      main(sys.argv)
    
    0 讨论(0)
  • 2020-11-21 23:03

    It is not a very smart way to do it, but I have had the same problem and this worked for me:

    import csv
    
    f = open('data.json')
    data = json.load(f)
    f.close()
    
    new_data = []
    
    for i in data:
       flat = {}
       names = i.keys()
       for n in names:
          try:
             if len(i[n].keys()) > 0:
                for ii in i[n].keys():
                   flat[n+"_"+ii] = i[n][ii]
          except:
             flat[n] = i[n]
       new_data.append(flat)  
    
    f = open(filename, "r")
    writer = csv.DictWriter(f, new_data[0].keys())
    writer.writeheader()
    for row in new_data:
       writer.writerow(row)
    f.close()
    
    0 讨论(0)
提交回复
热议问题