What is as_index in groupby in pandas?

后端 未结 2 1070
走了就别回头了
走了就别回头了 2020-12-01 04:09

What exactly is the function of as_index in groupby in Pandas?

相关标签:
2条回答
  • 2020-12-01 04:13

    When using the group by function, as_index can be set to true or false depending on if you want the column by which you grouped to be the index of the output.

    import pandas as pd
    table_r = pd.DataFrame({
        'colors': ['orange', 'red', 'orange', 'red'],
        'price': [1000, 2000, 3000, 4000],
        'quantity': [500, 3000, 3000, 4000],
    })
    new_group = table_r.groupby('colors',as_index=True).count().sort('price', ascending=False)
    print new_group
    

    output:

            price  quantity
    colors                 
    orange      2         2
    red         2         2
    

    Now with as_index=False

       colors  price  quantity
    0  orange      2         2
    1     red      2         2
    

    Note how colors is no longer an index when we change as_index=False

    0 讨论(0)
  • 2020-12-01 04:25

    print() is your friend when you don't understand a thing. It clears out doubts many times.

    Take a look:

    import pandas as pd
    
    df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]})
    
    print(df)
    
    print(df.groupby('books', as_index=True).sum())
    
    print(df.groupby('books', as_index=False).sum())
    

    Output:

      books  price
    0   bk1     12
    1   bk1     12
    2   bk1     12
    3   bk2     15
    4   bk2     15
    5   bk3     17
    
           price
    books       
    bk1       36
    bk2       30
    bk3       17
    
      books  price
    0   bk1     36
    1   bk2     30
    2   bk3     17
    

    When as_index=True the key(s) you use in groupby() will become an index in the new dataframe.

    The benefits you get when you set the column as index are:

    1. Speed. When you filter values based on the index column eg. df.loc['bk1'], it would be faster because of hashing of index column. It doesn't have to traverse the entire books column to find 'bk1'. It will just calculate the hash value of 'bk1' and find it in 1 go.

    2. Ease. When as_index=True you can use this syntax df.loc['bk1'] which is shorter and faster as opposed to df.loc[df.books=='bk1'] which is longer and slower.

    0 讨论(0)
提交回复
热议问题