问题
An existing question, Creating a Pandas Dataframe with Multi Column Index, deals with a very "regular" DataFrame
where all columns and rows are products and all data is present.
My situation is, alas, different. I have this kind of data:
[{"street": "Euclid", "house":42, "area":123, (1,"bedrooms"):1, (1,"bathrooms"):4},
{"street": "Euclid", "house":19, "area":234, (2,"bedrooms"):3, (2,"bathrooms"):3},
{"street": "Riemann", "house":42, "area":345, (1,"bedrooms"):5,
(1,"bathrooms"):2, (2,"bedrooms"):12, (2, "bathrooms"):17},
{"street": "Riemann", "house":19, "area":456, (1,"bedrooms"):7, (1,"bathrooms"):1}]
and I want this sort of DataFrame
with both rows and columns having multi-level indexes:
area 1 2
street house bedrooms bathrooms bedrooms bathrooms
Euclid 42 123 1 4
Euclid 19 234 3 3
Riemann 42 345 5 2 12 17
Riemann 19 456 7 1
So, the row index should be
MultiIndex([("Euclid",42),("Euclid",19),("Riemann",42),("Riemann",19)],
names=["street","house"])
and the columns index should be
MultiIndex([("area",None),(1,"bedrooms"),(1,"bathrooms"),(2,"bedrooms"),(2,"bathrooms")],
names=["floor","entity"])
and I see no way to generate these indexes from the list of dictionaries I have.
回答1:
i feel there should be something better than this; hopefully someone on SO puts out sth much better:
Create a function to process each entry in the dictionary:
def process(entry):
#read in data and get the keys to be the column names
m = pd.DataFrame.from_dict(entry,orient='index').T
#set index
m = m.set_index(['street','house'])
#create multi-index columns
col1 = [ent[0] if isinstance(ent,tuple) else ent for ent in m.columns ]
col2 = [ent[-1] if isinstance(ent,tuple) else None for ent in m.columns ]
#assign multi-index column to m
m.columns=[col1,col2]
return m
Apply function above to data(i wrapped the dictionary into the data variable):
res = [process(entry) for entry in data]
concatenate to get final output
pd.concat(res)
area 1 2
NaN bedrooms bathrooms bedrooms bathrooms
street house
Euclid 42 123 1 4 NaN NaN
19 234 NaN NaN 3 3
Riemann 42 345 5 2 12 17
19 456 7 1 NaN NaN
来源:https://stackoverflow.com/questions/61181247/how-do-i-create-a-dataframe-with-multi-level-columns