Python flatten multilevel/nested JSON

后端 未结 7 620
隐瞒了意图╮
隐瞒了意图╮ 2020-12-03 05:23

I am trying to convert JSON to CSV file, that I can use for further analysis. Issue with my structure is that I have quite some nested dict/lists when I convert my JSON file

相关标签:
7条回答
  • 2020-12-03 06:22

    In case anyone else finds themselves here and is looking for a solution better suited to subsequent programmatic treatment:

    Flattening the lists creates the need to process the headings for list lengths etc. I wanted a solution where if there are 2 lists of e.g. 2 elements then there would be four rows generated yielding each valid potential data row (see below for actual examples):

    class MapFlattener:
    
        def __init__(self):
            self.headings = []
            self.rows = []
    
        def add_rows(self, headings, rows):
            self.headings = [*self.headings, *headings]
            if self.rows:
                new_rows = []
                for base_row in self.rows:
                    for row in rows:
                        new_rows.append([*base_row, *row])
                self.rows = new_rows
            else:
                self.rows = rows
    
        def __call__(self, mapping):
            for heading, value in mapping.items():
                if isinstance(value, Mapping):
                    sub_headings, sub_rows = MapFlattener()(value)
                    sub_headings = [f'{heading}:{sub_heading}' for sub_heading in sub_headings]
                    self.add_rows(sub_headings, sub_rows)
                    continue
    
                if isinstance(value, list):
                    self.add_rows([heading], [[e] for e in value])
                    continue
    
                self.add_rows([heading], [[value]])
    
            return self.headings, self.rows
    
    
    def map_flatten(mapping):
        return MapFlattener()(mapping)
    

    This creates output more in line with relational data:

    In [22]: map_flatten({'l': [1,2]})                                                                                                          
    Out[22]: (['l'], [[1], [2]])
    
    In [23]: map_flatten({'l': [1,2], 'n': 7})                                                                                                  
    Out[23]: (['l', 'n'], [[1, 7], [2, 7]])
    
    In [24]: map_flatten({'l': [1,2], 'n': 7, 'o': {'a': 1, 'b': 2}})                                                                           
    Out[24]: (['l', 'n', 'o:a', 'o:b'], [[1, 7, 1, 2], [2, 7, 1, 2]])
    

    This is particularly useful if you are using the csv in spreadsheets etc. and need to process the flattened data.

    0 讨论(0)
提交回复
热议问题