问题
I have come accross a nice solution to insert subtotals into a pandas groupby dataframe. However, now I would like to modify the result to show relative values with respect to the subtotals, instead of the absolute values.
This is the code to show the groupby:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"Category": np.random.choice(["Group A", "Group B"], 50),
"Product": np.random.choice(["Product 1", "Product 2"], 50),
"Units_Sold": np.random.randint(1, 100, size=(50)),
"Date": np.random.choice(
pd.date_range("1/1/2011", "03/31/2011", freq="D"), 50, replace=False
),
}
)
iList = ["Category", "Product"]
pvt = pd.concat(
[df.assign(**{x: "" for x in iList[i:]}).groupby(iList).sum() for i in range(1, 3)]
).sort_index()
print(pvt)
which results into
Units_Sold
Category Product
Group A 1170
Product 1 434
Product 2 736
Group B 980
Product 1 437
Product 2 543
I have tried something along the lines
pvt.transform(lambda x: (round(x / x['Group A', ''],2)*100).astype(int).astype(str)+"%")
but obviously this only calculates values relative to the first row.
What I am looking for is
Units_Sold
Category Product
Group A 100%
Product 1 37%
Product 2 63%
Group B 100%
Product 1 45%
Product 2 55%
Thanks a lot!
回答1:
Use GroupBy.apply by first level of MultiIndex
with lambda function:
f = lambda x: (x / x.iloc[0]).mul(100).round(2).astype(int).astype(str)+"%"
df = pvt.groupby(level=0).apply(f)
print (df)
Units_Sold
Category Product
Group A 100%
Product 1 49%
Product 2 50%
Group B 100%
Product 1 52%
Product 2 47%
Or use GroupBy.transform with GroupBy.first:
df = (pvt.div(pvt.groupby(level=0).transform('first'))
.mul(100)
.round(2)
.astype(int)
.astype(str)+"%")
print (df)
Units_Sold
Category Product
Group A 100%
Product 1 43%
Product 2 56%
Group B 100%
Product 1 58%
Product 2 41%
来源:https://stackoverflow.com/questions/59840968/transform-pandas-groupby-result-with-subtotals-to-relative-values