I\'ve searched stackoverflow for a solution to this -> but all solutions are slightly different to my needs.
I have a large ndarray (roughly 107 million rows) lets c
I think input data are different:
L = [[{'A': 5, 'C': 3, 'D': 3}],
[{'A': 7, 'B': 9, 'F': 5}],
[{'B': 4, 'C': 7, 'E': 6}]]
print (pd.DataFrame(L))
0
0 {'A': 5, 'C': 3, 'D': 3}
1 {'A': 7, 'B': 9, 'F': 5}
2 {'B': 4, 'C': 7, 'E': 6}
Possible solution is flattening:
from itertools import chain
df = pd.DataFrame(chain.from_iterable(L)).sort_index(axis=1)
print (df)
A B C D E F
0 5.0 NaN 3.0 3.0 NaN NaN
1 7.0 9.0 NaN NaN NaN 5.0
2 NaN 4.0 7.0 NaN 6.0 NaN
If input datais numpy array use solution from comment by @Code Different:
arr = np.array([{'A': 5, 'C': 3, 'D': 3},
{'A': 7, 'B': 9, 'F': 5},
{'B': 4, 'C': 7, 'E': 6}])
df = pd.DataFrame(arr.tolist()).sort_index(axis=1)
print (df)
A B C D E F
0 5.0 NaN 3.0 3.0 NaN NaN
1 7.0 9.0 NaN NaN NaN 5.0
2 NaN 4.0 7.0 NaN 6.0 NaN