问题
I have been struggling with finding a way to expand/clone observation rows based on a pre-determined number and a grouping variable (id). For context, here is an example data frame using pandas and numpy (python3).
df = pd.DataFrame([[1, 15], [2, 20]], columns = ['id', 'num'])
df
Out[54]:
id num
0 1 15
1 2 20
I want to expand/clone the rows by the number given in the "num" variable based on their ID group. In this case, I would want 15 rows for id = 1 and 20 rows for id = 2. This is probably an easy question, but I am struggling to make this work. I've been messing around with reindex and np.repeat, but the conceptual pieces are not fitting together for me.
In R, I used the expandRows function found in the splitstackshape package, which would look something like this:
library(splitstackshape)
df <- data.frame(id = c(1, 2), num = c(15, 20))
df
id num
1 1 15
2 2 20
df2 <- expandRows(df, "num", drop = FALSE)
df2
id num
1 1 15
1.1 1 15
1.2 1 15
1.3 1 15
1.4 1 15
1.5 1 15
1.6 1 15
1.7 1 15
1.8 1 15
1.9 1 15
1.10 1 15
1.11 1 15
1.12 1 15
1.13 1 15
1.14 1 15
2 2 20
2.1 2 20
2.2 2 20
2.3 2 20
2.4 2 20
2.5 2 20
2.6 2 20
2.7 2 20
2.8 2 20
2.9 2 20
2.10 2 20
2.11 2 20
2.12 2 20
2.13 2 20
2.14 2 20
2.15 2 20
2.16 2 20
2.17 2 20
2.18 2 20
2.19 2 20
Again, sorry if this is a stupid question and thanks in advance for any help.
回答1:
I can't replicate your index, but I can replicate your values, using np.repeat
, quite easily in fact.
v = df.values
df = pd.DataFrame(v.repeat(v[:, -1], axis=0), columns=df.columns)
If you want the exact index (although I can't see why you'd need to), you'd need a groupby
operation -
def f(x):
return x.astype(str) + '.' + np.arange(len(x)).astype(str)
idx = df.groupby('id').id.apply(f).values
Assign idx
to df
's index -
df.index = idx
来源:https://stackoverflow.com/questions/48012083/expanding-pandas-data-frame-rows-based-on-number-and-group-id-python-3