Finding the Index with maximum number of rows

蓝咒 提交于 2021-02-05 09:44:38

问题


My task:

For the next set of questions, we will be using census data from the United States Census Bureau. Counties are political and geographic subdivisions of states in the United States. This dataset contains population data for counties and states in the US from 2010 to 2015. See this document for a description of the variable names.

The census dataset (census.csv) should be loaded as census_df. Answer questions using this as appropriate.

Question 5

Which state has the most counties in it? (hint: consider the sumlevel key carefully! You'll need this for future questions too...)

This function should return a single string value.

census_df = pd.read_csv('census.csv')
census_df = census_df[census_df['SUMLEV']==50]
census_df_2 = census_df.groupby(by='STNAME',axis=0)

This, however, does not group the DataFrame by 'STNAME', which can be seen when executing census_df_2.head()

I suppose this should work on a grouped DataFrame:

def answer_five():
    return census_df_2[ census_df_2['COUNTY'].count() == max( census_df_2['COUNTY'].count() ) ].index().tolist()[0]
answer_five()

Why does the groupby function not work? I've tried changing the axis and using the set_index() function instead but I can't get it to work.

If someone knows another way to solve this problem I'd appreciate it.


回答1:


groupby simply returns a groupby object, you'll have to specify an aggregate function to be used on this object, e.g.

df.groupby(by='STNAME').aggregate({'COUNTY': 'nunique'}).idxmax()[0]

gives

'Texas'

See the pandas docs here for an introduction to grouping/aggregating.



来源:https://stackoverflow.com/questions/56883626/finding-the-index-with-maximum-number-of-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!