问题
I created a multiIndex DataFrame by:
df.set_index(['Field1', 'Field2'], inplace=True)
If this is not a multiIndex DataFrame please tell me how to make one.
I want to:
- Group by the same columns that are in the index
- Aggregate a count of each group
- Then return the whole thing as a Series with Field1 and Field2 as the index
How do I go about doing this?
ADDITIONAL INFO
I have a multiIndex dataFrame that looks like this:
Continent Sector Count
Asia 1 4
2 1
Australia 1 1
Europe 1 1
2 3
3 2
North America 1 1
5 1
South America 5 1
How can I return this as a Series with the index of [Continent, Sector]
回答1:
I think you need groupby with aggregate size:
df = pd.DataFrame({'Field1':[1,1,1],
'Field2':[4,4,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
df.set_index(['Field1', 'Field2'], inplace=True)
print (df)
C D E F
Field1 Field2
1 4 7 1 5 7
4 8 3 3 4
6 9 5 6 3
print (df.index)
MultiIndex(levels=[[1], [4, 6]],
labels=[[0, 0, 0], [0, 0, 1]],
names=['Field1', 'Field2'])
print (df.groupby(level=[0,1]).size())
Field1 Field2
1 4 2
6 1
dtype: int64
print (df.groupby(level=['Field1', 'Field2']).size())
Field1 Field2
1 4 2
6 1
dtype: int64
print (df.groupby(level=['Field1', 'Field2']).count())
C D E F
Field1 Field2
1 4 2 2 2 2
6 1 1 1 1
What is the difference between size and count in pandas?
EDIT by comment:
df.set_index(['Continent', 'Sector'], inplace=True)
print (df)
Count
Continent Sector
Asia 1 4
2 1
Australia 1 1
Europe 1 1
2 3
3 2
North America 1 1
5 1
South America 5 1
print (df['Count'])
Continent Sector
Asia 1 4
2 1
Australia 1 1
Europe 1 1
2 3
3 2
North America 1 1
5 1
South America 5 1
Name: Count, dtype: int64
Or:
print (df.squeeze())
Continent Sector
Asia 1 4
2 1
Australia 1 1
Europe 1 1
2 3
3 2
North America 1 1
5 1
South America 5 1
Name: Count, dtype: int64
All together with set_index
:
print (df)
Continent Sector Count
0 Asia 1 4
1 Asia 2 1
2 Australia 1 1
3 Europe 1 1
4 Europe 2 3
5 Europe 3 2
6 North America 1 1
7 North America 5 1
8 South America 5 1
print (df.set_index(['Continent', 'Sector'])['Count'])
Continent Sector
Asia 1 4
2 1
Australia 1 1
Europe 1 1
2 3
3 2
North America 1 1
5 1
South America 5 1
Name: Count, dtype: int64
回答2:
You can just query the dataframe
like this:
df['count']
来源:https://stackoverflow.com/questions/41114831/convert-multiindex-dataframe-to-series