Cartesian product of a pandas dataframe with itself

后端 未结 3 1636
有刺的猬
有刺的猬 2020-12-10 19:59

Given a dataframe:

    id  value
0    1     a
1    2     b
2    3     c

I want to get a new dataframe that is basically the cartesian produ

相关标签:
3条回答
  • 2020-12-10 20:30

    I had this problem before , this is my solution ..

    import itertools
    import pandas as pd 
    c = list(itertools.product(DF.id.tolist(), DF.id.tolist()))
    Dic=dict(zip(DF.id, DF.value))
    df = pd.DataFrame(c , columns=['id', 'id_2'])
    df[['value','value_2']]=df.apply(lambda x:x.map(Dic))
    df.loc[df.id!=df.id_2,:]
    
    
    Out[125]: 
       id  id_2 value value_2
    1   1     2     a       b
    2   1     3     a       c
    3   2     1     b       a
    5   2     3     b       c
    6   3     1     c       a
    7   3     2     c       b
    
    0 讨论(0)
  • 2020-12-10 20:42

    This can be done entirely in pandas:

    df.loc[:, 'key_col'] = 1 # create a join column that will give us the Cartesian Product
    
    (df.merge(df, df, on='key_col', suffixes=('', '_2'))
     .query('id != id_2') # filter out joins on the same row
     .drop('key_col', axis=1)
     .reset_index(drop=True))
    

    Or if you don't want to have to drop the dummy column, you can temporarily create it when calling df.merge:

    (df.merge(df, on=df.assign(key_col=1)['key_col'], suffixes=('', '_2'))
     .query('id != id_2') # filter out joins on the same row
     .reset_index(drop=True))
    
    0 讨论(0)
  • 2020-12-10 20:44

    We want to get the indices for the upper and lower triangles of a square matrix. Or in other words, where the identity matrix is zero

    np.eye(len(df))
    
    array([[ 1.,  0.,  0.],
           [ 0.,  1.,  0.],
           [ 0.,  0.,  1.]])
    

    So I subtract it from 1 and

    array([[ 0.,  1.,  1.],
           [ 1.,  0.,  1.],
           [ 1.,  1.,  0.]])
    

    In a boolean context and passed to np.where I get exactly the upper and lower triangle indices.

    i, j = np.where(1 - np.eye(len(df)))
    df.iloc[i].reset_index(drop=True).join(
        df.iloc[j].reset_index(drop=True), rsuffix='_2')
    
       id value  id_2 value_2
    0   1     a     2       b
    1   1     a     3       c
    2   2     b     1       a
    3   2     b     3       c
    4   3     c     1       a
    5   3     c     2       b
    
    0 讨论(0)
提交回复
热议问题