Python flatten multilevel/nested JSON

后端未结

关注

 7  624

I am trying to convert JSON to CSV file, that I can use for further analysis. Issue with my structure is that I have quite some nested dict/lists when I convert my JSON file

相关标签:

7条回答

抹茶落季

2020-12-03 06:22

In case anyone else finds themselves here and is looking for a solution better suited to subsequent programmatic treatment:

Flattening the lists creates the need to process the headings for list lengths etc. I wanted a solution where if there are 2 lists of e.g. 2 elements then there would be four rows generated yielding each valid potential data row (see below for actual examples):

class MapFlattener:

    def __init__(self):
        self.headings = []
        self.rows = []

    def add_rows(self, headings, rows):
        self.headings = [*self.headings, *headings]
        if self.rows:
            new_rows = []
            for base_row in self.rows:
                for row in rows:
                    new_rows.append([*base_row, *row])
            self.rows = new_rows
        else:
            self.rows = rows

    def __call__(self, mapping):
        for heading, value in mapping.items():
            if isinstance(value, Mapping):
                sub_headings, sub_rows = MapFlattener()(value)
                sub_headings = [f'{heading}:{sub_heading}' for sub_heading in sub_headings]
                self.add_rows(sub_headings, sub_rows)
                continue

            if isinstance(value, list):
                self.add_rows([heading], [[e] for e in value])
                continue

            self.add_rows([heading], [[value]])

        return self.headings, self.rows


def map_flatten(mapping):
    return MapFlattener()(mapping)

This creates output more in line with relational data:

In [22]: map_flatten({'l': [1,2]})                                                                                                          
Out[22]: (['l'], [[1], [2]])

In [23]: map_flatten({'l': [1,2], 'n': 7})                                                                                                  
Out[23]: (['l', 'n'], [[1, 7], [2, 7]])

In [24]: map_flatten({'l': [1,2], 'n': 7, 'o': {'a': 1, 'b': 2}})                                                                           
Out[24]: (['l', 'n', 'o:a', 'o:b'], [[1, 7, 1, 2], [2, 7, 1, 2]])

This is particularly useful if you are using the csv in spreadsheets etc. and need to process the flattened data.

0 讨论(0)

上一页 1 2