How to convert specific CSV format to JSON using Python

…衆ロ難τιáo~ 提交于 2020-01-07 04:57:07


I have downloaded a CSV file from Google Trends which presents data in this format:

Top cities for golden globes
City,golden globes
New York (United States),100
Los Angeles (United States),91
Toronto (Canada),69

Top regions for golden globes
Region,golden globes
United States,100

There are 3-4 of these groups separated by whitespace. The first line of each group contains text I want to use as a key, followed by a list of dictionaries I need associated with that key. Does anyone have any advice on some Python tools I could use to make this happen? I'm not having much luck with Python's CSV library.

My desired output from the above CSV would look like this:

"Top cities for golden globes" :
      "New York (United States)" : 100,
      "Los Angeles (United States)" : 91,
      "Toronto (Canada)" : 69
"Top regions for golden globes" :
      "United States" : 100,
      "Canada" : 91,
      "Ireland" : 72,
      "Australia" : 72


Your input format is so expectable that I would do it by hand, without a CSV library.

import json
from collections import defaultdict

fh = open("yourfile.csv")
result = defaultdict(dict) #dictionary holding the data
current_key = "" #current category
ignore_next = False #flag to skip header

for line in fh:
    line = line.strip() #throw away newline
    if line == "": #line is empty
        current_key = ""
    if current_key == "": #current_key is empty
        current_key = line #so the current line is the header for the following data
        ignore_next = True
    if ignore_next: #we're in a line that can be ignored
        ignore_next = False
    (a,b) = line.split(",")
    result[current_key][a] = b

#pretty-print data
print json.dumps(result, sort_keys=True, indent=4)


I'd try something like...:

row = []
dd = {}
with open('the.csv') as f:
    r = csv.reader(f)
    while True:
        if row:  # normal case, non-empty row
            d[row[0]] = row[1]
            row = next(r, None)
            if row is None: break
        else:  # row is empty at start and after blank line
            category = next(f, None)
            if category is None: break
            category = category.strip()
            next(r)  # skip headers row
            d = dd[category] = {}
            row = next(r, None)
            if row is None: break

Now, dd should be the dict-of-dicts you want, and you can json.dump it as you wish.

