I have a dataframe with columns like:
[\'id\',\'name\',\'foo1\', \'foo1\', \'foo1\', \'foo2\',\'foo2\', \'foo3\']
I would like to get a new dat
The basic idea is that you can group by your columns names and do mean operations for each group.
I saw some comments for your question and tried to give you different ways to achieve the goal. (Solution (3) is the best I found!)
(1) Quick solution. If you have very limited columns that are non-numeric, and own unique names, e.g., columns id
and name
. What you can do is:
First set index ['id', 'name']
to preserve them,
df = df.set_index(['id', 'name'])
then use DataFrame.groupby
function on columns
, set axis=1
(iterate over each column), apply mean
function for each group.
df.groupby(by=df.columns, axis=1).mean()
And finally, reset index to recover ['id', 'name']
columns
df = df.reset_index()
Here is a sample code:
In [35]: df = pd.DataFrame([['001', 'a', 1, 10, 100, 1000], ['002', 'b', 2, 20, 200, 2000]], columns=['id', 'name', 'c1', 'c2', 'c2', 'c3'], index=list('AB'))
In [36]: df = df.set_index(['id', 'name'])
In [37]: df = df.groupby(by=df.columns, axis=1).mean()
In [38]: df = df.reset_index()
In [39]: df
Out[39]:
id name c1 c2 c3
0 001 a 1 55 1000
1 002 b 2 110 2000
(2) Complete solution. If you have lots of columns that are non-numeric and unique named, what you can do is:
First transpose you dataframe,
df2 = df.transpose()
Then you do group by operations (on its index and axis=0
), but carefully handle each groups: for these numeric groups, return their mean value; and for these non-numeric groups, return their first row:
df2 = df2.groupby(by=df2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[0])
And finally, transpose back:
df = df2.transpose()
Here is sample of code:
In [98]: df = pd.DataFrame([['001', 'a', 1, 10, 100, 1000], ['002', 'b', 2, 20, 200, 2000]], columns=['id', 'name', 'c1', 'c2', 'c2', 'c3'], index=list('AB'))
In [99]: df2 = df.transpose()
In [100]: df2 = df2.groupby(by=df2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[0])
In [101]: df3 = df2.transpose()
In [102]: df3
Out[102]:
c1 c2 c3 id name
A 1 55 1000 001 a
B 2 110 2000 002 b
In [103]: df
Out[103]:
id name c1 c2 c2 c3
A 001 a 1 10 100 1000
B 002 b 2 20 200 2000
You need to import numbers
More notes:
(3) All in one! This solution is the best I found:
df.groupby(by=df.columns, axis=1).apply(lambda g: g.mean(axis=1) if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[:,0])
I tried to handle each group for the un-transposed groups, that is,
df.groupby(by=df.columns, axis=1).apply(gf)
And
gf = lambda g: g.mean(axis=1) if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[:,0]
I failed before, because I do not carefully hand the axis. You must set axis=1
for mean
function, and return columns for non-numeric groups.
Thanks!