问题
I have a multiindex pandas dataframe like this:
lst = [(1, 2), (3, 4), (5, 6), (7, 8), (9, 10), (11, 12), (13, 14), (21, 22)]
df = pd.DataFrame(lst, pd.MultiIndex.from_product([['A', 'B'], ['1','2', '3', '4']])).loc[:('B', '2')]
df["tuple"] = list(zip(df[0], df[1]))
#df:
0 1 tuple
A 1 1 2 (1, 2)
2 3 4 (3, 4)
3 5 6 (5, 6)
4 7 8 (7, 8)
B 1 9 10 (9, 10)
2 11 12 (11, 12)
I want to transform the column, containing the tuples, into a list of tuples. My approach is:
#dataframe to append list of tuples
new_df = pd.DataFrame([1, 2], index = list("AB") )
#voila a list of tuples
new_df["list_of_tuples"] = df["tuple"].unstack(level = -1).values.tolist()
#new_df:
0 list_of_tuples
A 1 [(1, 2), (3, 4), (5, 6), (7, 8)]
B 2 [(9, 10), (11, 12), None, None]
This works, but only for multiindex dataframes with equal length for each entry. If all entries don't have the same length, the missing columns give rise to a None
value in the list. My attempts to remove numpy NaN
values, before creating a list, failed. Is there an approach to prevent the appearance of None
in the final list of tuples?
回答1:
Is this what you need ?
df.groupby(level=[0]).tuple.apply(list)
Out[306]:
A [(1, 2), (3, 4), (5, 6), (7, 8)]
B [(9, 10), (11, 12)]
Name: tuple, dtype: object
来源:https://stackoverflow.com/questions/48750682/pandas-flattening-a-multiindex-column-containing-tuples-but-ignore-missing-va