I have a nested dictionary, whereby the sub-dictionary use lists:
nested_dict = {\'string1\': {69: [1231, 232], 67:[682, 12], 65: [1, 1]},
`string2` :{2
This should give you the result you are looking for, although it's probably not the most elegant solution. There's probably a better (more pandas
way) to do it.
I parsed your nested dict and built a list of dictionaries (one for each row).
# some sample input
nested_dict = {
'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]},
'string2' :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]},
'string3' :{28673: [83, 24], 22737:[83, 94, 1103, 103], 19424: [65, 24]}
}
# new list is what we will use to hold each row
new_list = []
for k1 in nested_dict:
curr_dict = nested_dict[k1]
for k2 in curr_dict:
new_dict = {'col1': k1, 'col2': k2}
new_dict.update({'col%d'%(i+3): curr_dict[k2][i] for i in range(len(curr_dict[k2]))})
new_list.append(new_dict)
# create a DataFrame from new list
df = pd.DataFrame(new_list)
The output:
col1 col2 col3 col4 col5 col6
0 string2 28672 82 23 NaN NaN
1 string2 22736 82 93 1102.0 102.0
2 string2 19423 64 23 NaN NaN
3 string3 19424 65 24 NaN NaN
4 string3 28673 83 24 NaN NaN
5 string3 22737 83 94 1103.0 103.0
6 string1 65 1 1 NaN NaN
7 string1 67 682 12 NaN NaN
8 string1 69 1231 232 NaN NaN
There is an assumption that the input will always contain enough data to create a col1
and a col2
.
I loop through nested_dict
. It is assumed that each element of nested_dict
is also a dictionary. We loop through that dictionary as well (curr_dict
). The keys k1
and k2
are used to populate col1
and col2
. For the rest of the keys, we iterate through the list contents and add a column for each element.
Here's a method which uses a recursive generator to unroll the nested dictionaries. It won't assume that you have exactly two levels, but continues unrolling each dict
until it hits a list
.
nested_dict = {
'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]},
'string2' :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]},
'string3': [101, 102]}
def unroll(data):
if isinstance(data, dict):
for key, value in data.items():
# Recursively unroll the next level and prepend the key to each row.
for row in unroll(value):
yield [key] + row
if isinstance(data, list):
# This is the bottom of the structure (defines exactly one row).
yield data
df = pd.DataFrame(list(unroll(nested_dict)))
Because unroll
produces a list of lists rather than dicts, the columns will be named numerically (from 0 to 5 in this case). So you need to use rename
to get the column labels you want:
df.rename(columns=lambda i: 'col{}'.format(i+1))
This returns the following result (note that the additional string3
entry is also unrolled).
col1 col2 col3 col4 col5 col6
0 string1 69 1231 232.0 NaN NaN
1 string1 67 682 12.0 NaN NaN
2 string1 65 1 1.0 NaN NaN
3 string2 28672 82 23.0 NaN NaN
4 string2 22736 82 93.0 1102.0 102.0
5 string2 19423 64 23.0 NaN NaN
6 string3 101 102 NaN NaN NaN