What is the best way to apply a function over the index of a Pandas DataFrame
?
Currently I am using this verbose approach:
pd.DataFrame({\"Month\":
A lot of answers are returning the Index as an array, which loses information about the index name etc (though you could do pd.Series(index.map(myfunc), name=index.name)
). It also won't work for a MultiIndex.
The way that I worked with this is to use "rename":
mix = pd.MultiIndex.from_tuples([[1, 'hi'], [2, 'there'], [3, 'dude']], names=['num', 'name'])
data = np.random.randn(3)
df = pd.Series(data, index=mix)
print(df)
num name
1 hi 1.249914
2 there -0.414358
3 dude 0.987852
dtype: float64
# Define a few dictionaries to denote the mapping
rename_dict = {i: i*100 for i in df.index.get_level_values('num')}
rename_dict.update({i: i+'_yeah!' for i in df.index.get_level_values('name')})
df = df.rename(index=rename_dict)
print(df)
num name
100 hi_yeah! 1.249914
200 there_yeah! -0.414358
300 dude_yeah! 0.987852
dtype: float64
The only trick with this is that your index needs to have unique labels b/w different multiindex levels, but maybe someone more clever than me knows how to get around that. For my purposes this works 95% of the time.
You can always convert an index using its to_series()
method, and then either apply
or map
, according to your preferences/needs.
ret = df.index.map(foo) # Returns pd.Index
ret = df.index.to_series().map(foo) # Returns pd.Series
ret = df.index.to_series().apply(foo) # Returns pd.Series
All of the above can be assigned directly to a new or existing column of df
:
df["column"] = ret
Just for completeness: pd.Index.map, pd.Series.map and pd.Series.apply all operate element-wise. I often use map
to apply lookups represented by dicts
or pd.Series
. apply
is more generic because you can pass any function along with additional args
or kwargs
. The differences between apply
and map
are further discussed in this SO thread. I don't know why pd.Index.apply
was omitted.
Assuming that you want to make a column in you're current DataFrame by applying your function "foo" to the index. You could write...
df['Month'] = df.index.map(foo)
To generate the series alone you could instead do ...
pd.Series({x: foo(x) for x in foo.index})
As already suggested by HYRY in the comments, Series.map is the way to go here. Just set the index to the resulting series.
Simple example:
df = pd.DataFrame({'d': [1, 2, 3]}, index=['FOO', 'BAR', 'BAZ'])
df
d
FOO 1
BAR 2
BAZ 3
df.index = df.index.map(str.lower)
df
d
foo 1
bar 2
baz 3
As pointed out by @OP. the df.index.map(str.lower)
call returns a numpy array.
This is because dataframe indices are based on numpy arrays, not Series.
The only way of making the index into a Series is to create a Series from it.
pd.Series(df.index.map(str.lower))
The Index
class now subclasses the StringAccessorMixin
, which means that you can do the above operation as follows
df.index.str.lower()
This still produces an Index object, not a Series.