问题
I had a similar question using Postgres SQL, but I figured that this kind of task is really hard to do in Postgres, and I think python/pandas would make this a lot easier, although I still can't quite come up with the solution.
I now have a Pandas Dataframe which looks like this:
df={'planid' : ['A', 'A', 'B', 'B', 'C', 'C'],
'x' : ['a1', 'a2', 'b1', 'b2', 'c1', 'c2']}
df=pd.DataFrame(df)
df
planid x
0 A a1
1 A a2
2 B b1
3 B b2
4 C c1
5 C c2
I want to get all possible permutations where planid are not equal to each other. In other words, think of each value in planid as a "bucket" and I want all possible combinations if I were to draw values from x
from each
"bucket" in planid
. In this particular example, there are 8 total permutations {(a1, b1, c1), (a1, b2, c1), (a1, b1, c2), (a1, b2, c2), (a2, b1, c1), (a2, b2, c1), (a2, b1, c2), (a2, b2, c2)}.
However, I want my resulting data frame to be three columns, planid
, x
and another column, perhaps named permutation_counter
. The final data frame has all the different permutations labeled with permutation_counter
. In other words, I want my final dataframe to look like
planid x permutation_counter
0 A a1 1
1 B b1 1
2 C c1 1
3 A a1 2
4 B b2 2
5 C c1 2
6 A a1 3
7 B b1 3
8 C c2 3
9 A a1 4
10 B b2 4
11 C c2 4
12 A a2 5
13 B b1 5
14 C c1 5
15 A a2 6
16 B b2 6
17 C c1 6
18 A a2 7
19 B b1 7
20 C c2 7
21 A a2 8
22 B b2 8
23 C c2 8
Any help would be greatly appreciated!
回答1:
I was trying to chain as many steps together as possible. Break them down to see what each step does :)
df2 = pd.DataFrame(index=pd.MultiIndex.from_product([subdf['x'] for p, subdf in df.groupby('planid')], names=df.planid.unique())).reset_index().stack().reset_index()
df2.columns = ['permutation_counter', 'planid', 'x']
df2['permutation_counter'] += 1
print df2[['planid', 'x', 'permutation_counter']]
planid x permutation_counter
0 A a1 1
1 B b1 1
2 C c1 1
3 A a1 2
4 B b1 2
5 C c2 2
6 A a1 3
7 B b2 3
8 C c1 3
9 A a1 4
10 B b2 4
11 C c2 4
12 A a2 5
13 B b1 5
14 C c1 5
15 A a2 6
16 B b1 6
17 C c2 6
18 A a2 7
19 B b2 7
20 C c1 7
21 A a2 8
22 B b2 8
23 C c2 8
回答2:
@Happy001 beat me by a couple of minutes but I'll go ahead and post this anyway because I think it's a little easier to follow:
import numpy as np
import pandas as pd
import itertools
x = list( itertools.product( ['a1','b2'],['b1','b2'],['c1','c2'] ) )
x = list( itertools.chain(*x) )
df = pd.DataFrame({ 'planid' : np.tile( list('ABC'), 8 ),
'x' : x,
'p_count' : np.repeat( range(1,9), 3 ) })
results:
p_count planid x
0 1 A a1
1 1 B b1
2 1 C c1
3 2 A a1
4 2 B b1
5 2 C c2
...
21 8 A b2
22 8 B b2
23 8 C c2
来源:https://stackoverflow.com/questions/35518308/all-possible-permutations-columns-pandas-dataframe-within-the-same-column