Apply Function on DataFrame Index

后端 未结 4 1849
孤独总比滥情好
孤独总比滥情好 2021-01-31 01:22

What is the best way to apply a function over the index of a Pandas DataFrame? Currently I am using this verbose approach:

pd.DataFrame({\"Month\":          


        
相关标签:
4条回答
  • 2021-01-31 01:31

    A lot of answers are returning the Index as an array, which loses information about the index name etc (though you could do pd.Series(index.map(myfunc), name=index.name)). It also won't work for a MultiIndex.

    The way that I worked with this is to use "rename":

    mix = pd.MultiIndex.from_tuples([[1, 'hi'], [2, 'there'], [3, 'dude']], names=['num', 'name'])
    data = np.random.randn(3)
    df = pd.Series(data, index=mix)
    print(df)
    num  name 
    1    hi       1.249914
    2    there   -0.414358
    3    dude     0.987852
    dtype: float64
    
    # Define a few dictionaries to denote the mapping
    rename_dict = {i: i*100 for i in df.index.get_level_values('num')}
    rename_dict.update({i: i+'_yeah!' for i in df.index.get_level_values('name')})
    df = df.rename(index=rename_dict)
    print(df)
    num  name       
    100  hi_yeah!       1.249914
    200  there_yeah!   -0.414358
    300  dude_yeah!     0.987852
    dtype: float64
    

    The only trick with this is that your index needs to have unique labels b/w different multiindex levels, but maybe someone more clever than me knows how to get around that. For my purposes this works 95% of the time.

    0 讨论(0)
  • 2021-01-31 01:38

    You can always convert an index using its to_series() method, and then either apply or map, according to your preferences/needs.

    ret = df.index.map(foo)                # Returns pd.Index
    ret = df.index.to_series().map(foo)    # Returns pd.Series
    ret = df.index.to_series().apply(foo)  # Returns pd.Series
    

    All of the above can be assigned directly to a new or existing column of df:

    df["column"] = ret
    

    Just for completeness: pd.Index.map, pd.Series.map and pd.Series.apply all operate element-wise. I often use map to apply lookups represented by dicts or pd.Series. apply is more generic because you can pass any function along with additional args or kwargs. The differences between apply and map are further discussed in this SO thread. I don't know why pd.Index.apply was omitted.

    0 讨论(0)
  • 2021-01-31 01:39

    Assuming that you want to make a column in you're current DataFrame by applying your function "foo" to the index. You could write...

    df['Month'] = df.index.map(foo)
    

    To generate the series alone you could instead do ...

    pd.Series({x: foo(x) for x in foo.index})
    
    0 讨论(0)
  • 2021-01-31 01:45

    As already suggested by HYRY in the comments, Series.map is the way to go here. Just set the index to the resulting series.

    Simple example:

    df = pd.DataFrame({'d': [1, 2, 3]}, index=['FOO', 'BAR', 'BAZ'])
    df
            d
    FOO     1
    BAR     2
    BAZ     3
    
    df.index = df.index.map(str.lower)
    df
            d
    foo     1
    bar     2
    baz     3
    

    Index != Series

    As pointed out by @OP. the df.index.map(str.lower) call returns a numpy array. This is because dataframe indices are based on numpy arrays, not Series.

    The only way of making the index into a Series is to create a Series from it.

    pd.Series(df.index.map(str.lower))
    

    Caveat

    The Index class now subclasses the StringAccessorMixin, which means that you can do the above operation as follows

    df.index.str.lower()
    

    This still produces an Index object, not a Series.

    0 讨论(0)
提交回复
热议问题