I have a data frame with a hierarchical index in axis 1 (columns) (from a groupby.agg
operation):
USAF WBAN year month day s_PC s_CL
I'll share a straight-forward way that worked for me.
[" ".join([str(elem) for elem in tup]) for tup in df.columns.tolist()]
#df = df.reset_index() if needed
And if you want to retain any of the aggregation info from the second level of the multiindex you can try this:
In [1]: new_cols = [''.join(t) for t in df.columns]
Out[1]:
['USAF',
'WBAN',
'day',
'month',
's_CDsum',
's_CLsum',
's_CNTsum',
's_PCsum',
'tempfamax',
'tempfamin',
'year']
In [2]: df.columns = new_cols
All of the current answers on this thread must have been a bit dated. As of pandas
version 0.24.0, the .to_flat_index()
does what you need.
From panda's own documentation:
MultiIndex.to_flat_index()
Convert a MultiIndex to an Index of Tuples containing the level values.
A simple example from its documentation:
import pandas as pd
print(pd.__version__) # '0.23.4'
index = pd.MultiIndex.from_product(
[['foo', 'bar'], ['baz', 'qux']],
names=['a', 'b'])
print(index)
# MultiIndex(levels=[['bar', 'foo'], ['baz', 'qux']],
# codes=[[1, 1, 0, 0], [0, 1, 0, 1]],
# names=['a', 'b'])
Applying to_flat_index()
:
index.to_flat_index()
# Index([('foo', 'baz'), ('foo', 'qux'), ('bar', 'baz'), ('bar', 'qux')], dtype='object')
pandas
columnAn example of how you'd use it on dat
, which is a DataFrame with a MultiIndex
column:
dat = df.loc[:,['name','workshop_period','class_size']].groupby(['name','workshop_period']).describe()
print(dat.columns)
# MultiIndex(levels=[['class_size'], ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max']],
# codes=[[0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6, 7]])
dat.columns = dat.columns.to_flat_index()
print(dat.columns)
# Index([('class_size', 'count'), ('class_size', 'mean'),
# ('class_size', 'std'), ('class_size', 'min'),
# ('class_size', '25%'), ('class_size', '50%'),
# ('class_size', '75%'), ('class_size', 'max')],
# dtype='object')
df.columns = ['_'.join(tup).rstrip('_') for tup in df.columns.values]
In case you want to have a separator in the name between levels, this function works well.
def flattenHierarchicalCol(col,sep = '_'):
if not type(col) is tuple:
return col
else:
new_col = ''
for leveli,level in enumerate(col):
if not level == '':
if not leveli == 0:
new_col += sep
new_col += level
return new_col
df.columns = df.columns.map(flattenHierarchicalCol)
Following @jxstanford and @tvt173, I wrote a quick function which should do the trick, regardless of string/int column names:
def flatten_cols(df):
df.columns = [
'_'.join(tuple(map(str, t))).rstrip('_')
for t in df.columns.values
]
return df