How to build a JSON file with nested records from a flat data table?

前端未结

关注

 2  1036

I\'m looking for a Python technique to build a nested JSON file from a flat table in a pandas data frame. For example how could a pandas data frame table such as:

相关标签:

2条回答

盖世英雄少女心

2021-01-12 11:30

With some input from @root I used a different tack and came up with the following code, which seems to get most of the way there:

import pandas
import json
from collections import defaultdict

inputExcel = 'E:\\teamsMM.xlsx'
exportJson = 'E:\\teamsMM.json'

data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')

grouped = data.groupby(['teamname', 'members']).first()

results = defaultdict(lambda: defaultdict(dict))

for t in grouped.itertuples():
    for i, key in enumerate(t.Index):
        if i ==0:
            nested = results[key]
        elif i == len(t.Index) -1:
            nested[key] = t
        else:
            nested = nested[key]


formattedJson = json.dumps(results, indent = 4)

formattedJson = '{\n"teams": [\n' + formattedJson +'\n]\n }'

parsed = open(exportJson, "w")
parsed.write(formattedJson)

The resulting JSON file is this:

{
"teams": [
{
    "1": {
        "0": [
            [
                1, 
                0
            ], 
            "John", 
            "Doe", 
            "Anon", 
            "916-555-1234", 
            "none", 
            "john.doe@wildlife.net"
        ], 
        "1": [
            [
                1, 
                1
            ], 
            "Jane", 
            "Doe", 
            "Anon", 
            "916-555-4321", 
            "916-555-7890", 
            "jane.doe@wildlife.net"
        ]
    }, 
    "2": {
        "0": [
            [
                2, 
                0
            ], 
            "Mickey", 
            "Moose", 
            "Moosers", 
            "916-555-0000", 
            "916-555-1111", 
            "mickey.moose@wildlife.net"
        ], 
        "1": [
            [
                2, 
                1
            ], 
            "Minny", 
            "Moose", 
            "Moosers", 
            "916-555-2222", 
            "none", 
            "minny.moose@wildlife.net"
        ]
    }
}
]
 }

This format is very close to the desired end product. Remaining issues are: removing the redundant array [1, 0] that appears just above each firstname, and getting the headers for each nest to be "teamname": "1", "members": rather than "1": "0":

Also, I do not know why each record is being stripped of its heading on the conversion. For instance why is dictionary entry "firstname":"John" exported as "John".

0 讨论(0)

猫巷女王i

2021-01-12 11:51

This is the a solution that works and creates the desired JSON format. First, I grouped my dataframe by the appropriate columns, then instead of creating a dictionary (and losing data order) for each column heading/record pair, I created them as lists of tuples, then transformed the list into an Ordered Dict. Another Ordered Dict was created for the two columns that everything else was grouped by. Precise layering between lists and ordered dicts was necessary to for the JSON conversion to produce the correct format. Also note that when dumping to JSON, sort_keys must be set to false, or all your Ordered Dicts will be rearranged into alphabetical order.

import pandas
import json
from collections import OrderedDict

inputExcel = 'E:\\teams.xlsx'
exportJson = 'E:\\teams.json'

data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')

# This creates a tuple of column headings for later use matching them with column data
cols = []
columnList = list(data[0:])
for col in columnList:
    cols.append(str(col))
columnList = tuple(cols)

#This groups the dataframe by the 'teamname' and 'members' columns
grouped = data.groupby(['teamname', 'members']).first()

#This creates a reference to the index level of the groups
groupnames = data.groupby(["teamname", "members"]).grouper.levels
tm = (groupnames[0])

#Create a list to add team records to at the end of the first 'for' loop
teamsList = []

for teamN in tm:
    teamN = int(teamN)  #added this in to prevent TypeError: 1 is not JSON serializable
    tempList = []   #Create an temporary list to add each record to
    for index, row in grouped.iterrows():
        dataRow = row
        if index[0] == teamN:  #Select the record in each row of the grouped dataframe if its index matches the team number

            #In order to have the JSON records come out in the same order, I had to first create a list of tuples, then convert to and Ordered Dict
            rowDict = ([(columnList[2], dataRow[0]), (columnList[3], dataRow[1]), (columnList[4], dataRow[2]), (columnList[5], dataRow[3]), (columnList[6], dataRow[4]), (columnList[7], dataRow[5])])
            rowDict = OrderedDict(rowDict)
            tempList.append(rowDict)
    #Create another Ordered Dict to keep 'teamname' and the list of members from the temporary list sorted
    t = ([('teamname', str(teamN)), ('members', tempList)])
    t= OrderedDict(t)

    #Append the Ordered Dict to the emepty list of teams created earlier
    ListX = t
    teamsList.append(ListX)


#Create a final dictionary with a single item: the list of teams
teams = {"teams":teamsList} 

#Dump to JSON format
formattedJson = json.dumps(teams, indent = 1, sort_keys = False) #sort_keys MUST be set to False, or all dictionaries will be alphebetized
formattedJson = formattedJson.replace("NaN", '"NULL"') #"NaN" is the NULL format in pandas dataframes - must be replaced with "NULL" to be a valid JSON file
print formattedJson

#Export to JSON file
parsed = open(exportJson, "w")
parsed.write(formattedJson)

print"\n\nExport to JSON Complete"

0 讨论(0)