I am trying to convert JSON to CSV file, that I can use for further analysis. Issue with my structure is that I have quite some nested dict/lists when I convert my JSON file
In case anyone else finds themselves here and is looking for a solution better suited to subsequent programmatic treatment:
Flattening the lists creates the need to process the headings for list lengths etc. I wanted a solution where if there are 2 lists of e.g. 2 elements then there would be four rows generated yielding each valid potential data row (see below for actual examples):
class MapFlattener:
def __init__(self):
self.headings = []
self.rows = []
def add_rows(self, headings, rows):
self.headings = [*self.headings, *headings]
if self.rows:
new_rows = []
for base_row in self.rows:
for row in rows:
new_rows.append([*base_row, *row])
self.rows = new_rows
else:
self.rows = rows
def __call__(self, mapping):
for heading, value in mapping.items():
if isinstance(value, Mapping):
sub_headings, sub_rows = MapFlattener()(value)
sub_headings = [f'{heading}:{sub_heading}' for sub_heading in sub_headings]
self.add_rows(sub_headings, sub_rows)
continue
if isinstance(value, list):
self.add_rows([heading], [[e] for e in value])
continue
self.add_rows([heading], [[value]])
return self.headings, self.rows
def map_flatten(mapping):
return MapFlattener()(mapping)
This creates output more in line with relational data:
In [22]: map_flatten({'l': [1,2]})
Out[22]: (['l'], [[1], [2]])
In [23]: map_flatten({'l': [1,2], 'n': 7})
Out[23]: (['l', 'n'], [[1, 7], [2, 7]])
In [24]: map_flatten({'l': [1,2], 'n': 7, 'o': {'a': 1, 'b': 2}})
Out[24]: (['l', 'n', 'o:a', 'o:b'], [[1, 7, 1, 2], [2, 7, 1, 2]])
This is particularly useful if you are using the csv in spreadsheets etc. and need to process the flattened data.