Here is a pandas DataFrame I would like to manipulate:
import pandas as pd
data = {\"grouping\": [\"item1\", \"item1\", \"item1\", \"item2\", \"item2\", \"
Use set_index
and unstack
:
df = df.set_index(['grouping','labels']).unstack().rename_axis(None)
df.columns = df.columns.droplevel()
print(df)
Output:
labels A B C D
item1 5 1 8 None
item2 3 731 189 9
There are four idiomatic pandas
ways to do this.
pivot
set_index
pivot_table
groupby
pivot
df.pivot('grouping', 'labels', 'count')
set_index
df.set_index(['grouping', 'labels'])['count'].unstack()
pivot_table
df.pivot_table('count', 'grouping', 'labels')
groupby
df.groupby(['grouping', 'labels'])['count'].sum().unstack()
All yield
labels A B C D
grouping
item1 5.0 1.0 8.0 NaN
item2 3.0 731.0 189.0 9.0
timing
With the groupby
, set_index
, or pivot_table
approach, you can easily fill in missing values with fill_value=0
df.pivot_table('count', 'grouping', 'labels', fill_value=0)
df.groupby(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)
df.set_index(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)
All yield
labels A B C D
grouping
item1 5 1 8 0
item2 3 731 189 9
Additional thoughts on groupby
Because we don't require any aggregation. If we wanted to use groupby
, we can minimize the impact of the implicit aggregation by utilizing a less impactful aggregator.
df.groupby(['grouping', 'labels'])['count'].max().unstack()
or
df.groupby(['grouping', 'labels'])['count'].first().unstack()
timing groupby
Try:
In [1]: import pandas as pd
...:
...: data = {"grouping": ["item1", "item1", "item1", "item2", "item2", "item2", "item2"],
...: "labels": ["A", "B", "C", "A", "B", "C", "D"],
...: "count": [5, 1, 8, 3, 731, 189, 9]}
...:
In [2]: df = pd.DataFrame(data)
In [3]: df.pivot_table(index="grouping",columns="labels")
Out[3]:
count
labels A B C D
grouping
item1 5 1 8 NaN
item2 3 731 189 9
You put labels
in the index, but you want it in the columns:
>>> df.pivot_table(index='grouping', columns='labels')
count
labels A B C D
grouping
item1 5.0 1.0 8.0 NaN
item2 3.0 731.0 189.0 9.0
Note that this makes the columns a MultiIndex. If you don't want that, explicitly pass values
: df.pivot_table(index='grouping', columns='labels', values='count')
.
Also, note that the kind of reshape you seem to be looking for will only be possible if each combination of grouping and label has exactly one or zero values. If any combination occurs more than once, you need to decide how to aggregate them (e.g., by summing the matching values).