cartesian product in pandas

前端 未结 11 1789
再見小時候
再見小時候 2020-11-21 23:35

I have two pandas dataframes:

from pandas import DataFrame
df1 = DataFrame({\'col1\':[1,2],\'col2\':[3,4]})
df2 = DataFrame({\'col3\':[5,6]})     

相关标签:
11条回答
  • 2020-11-21 23:56

    If you have no overlapping columns, don't want to add one, and the indices of the data frames can be discarded, this may be easier:

    df1.index[:] = df2.index[:] = 0
    df_cartesian = df1.join(df2, how='outer')
    df_cartesian.index[:] = range(len(df_cartesian))
    
    0 讨论(0)
  • 2020-11-21 23:59

    Minimal code needed for this one. Create a common 'key' to cartesian merge the two:

    df1['key'] = 0
    df2['key'] = 0
    
    df_cartesian = df1.merge(df2, how='outer')
    
    0 讨论(0)
  • 2020-11-22 00:00

    If you have a key that is repeated for each row, then you can produce a cartesian product using merge (like you would in SQL).

    from pandas import DataFrame, merge
    df1 = DataFrame({'key':[1,1], 'col1':[1,2],'col2':[3,4]})
    df2 = DataFrame({'key':[1,1], 'col3':[5,6]})
    
    merge(df1, df2,on='key')[['col1', 'col2', 'col3']]
    

    Output:

       col1  col2  col3
    0     1     3     5
    1     1     3     6
    2     2     4     5
    3     2     4     6
    

    See here for the documentation: http://pandas.pydata.org/pandas-docs/stable/merging.html#brief-primer-on-merge-methods-relational-algebra

    0 讨论(0)
  • 2020-11-22 00:04

    Use pd.MultiIndex.from_product as an index in an otherwise empty dataframe, then reset its index, and you're done.

    a = [1, 2, 3]
    b = ["a", "b", "c"]
    
    index = pd.MultiIndex.from_product([a, b], names = ["a", "b"])
    
    pd.DataFrame(index = index).reset_index()
    

    out:

       a  b
    0  1  a
    1  1  b
    2  1  c
    3  2  a
    4  2  b
    5  2  c
    6  3  a
    7  3  b
    8  3  c
    
    0 讨论(0)
  • 2020-11-22 00:08

    You could start by taking the Cartesian product of df1.col1 and df2.col3, then merge back to df1 to get col2.

    Here's a general Cartesian product function which takes a dictionary of lists:

    def cartesian_product(d):
        index = pd.MultiIndex.from_product(d.values(), names=d.keys())
        return pd.DataFrame(index=index).reset_index()
    

    Apply as:

    res = cartesian_product({'col1': df1.col1, 'col3': df2.col3})
    pd.merge(res, df1, on='col1')
    #  col1 col3 col2
    # 0   1    5    3
    # 1   1    6    3
    # 2   2    5    4
    # 3   2    6    4
    
    0 讨论(0)
  • 2020-11-22 00:10

    As an alternative, one can rely on the cartesian product provided by itertools: itertools.product, which avoids creating a temporary key or modifying the index:

    import numpy as np 
    import pandas as pd 
    import itertools
    
    def cartesian(df1, df2):
        rows = itertools.product(df1.iterrows(), df2.iterrows())
    
        df = pd.DataFrame(left.append(right) for (_, left), (_, right) in rows)
        return df.reset_index(drop=True)
    

    Quick test:

    In [46]: a = pd.DataFrame(np.random.rand(5, 3), columns=["a", "b", "c"])
    
    In [47]: b = pd.DataFrame(np.random.rand(5, 3), columns=["d", "e", "f"])    
    
    In [48]: cartesian(a,b)
    Out[48]:
               a         b         c         d         e         f
    0   0.436480  0.068491  0.260292  0.991311  0.064167  0.715142
    1   0.436480  0.068491  0.260292  0.101777  0.840464  0.760616
    2   0.436480  0.068491  0.260292  0.655391  0.289537  0.391893
    3   0.436480  0.068491  0.260292  0.383729  0.061811  0.773627
    4   0.436480  0.068491  0.260292  0.575711  0.995151  0.804567
    5   0.469578  0.052932  0.633394  0.991311  0.064167  0.715142
    6   0.469578  0.052932  0.633394  0.101777  0.840464  0.760616
    7   0.469578  0.052932  0.633394  0.655391  0.289537  0.391893
    8   0.469578  0.052932  0.633394  0.383729  0.061811  0.773627
    9   0.469578  0.052932  0.633394  0.575711  0.995151  0.804567
    10  0.466813  0.224062  0.218994  0.991311  0.064167  0.715142
    11  0.466813  0.224062  0.218994  0.101777  0.840464  0.760616
    12  0.466813  0.224062  0.218994  0.655391  0.289537  0.391893
    13  0.466813  0.224062  0.218994  0.383729  0.061811  0.773627
    14  0.466813  0.224062  0.218994  0.575711  0.995151  0.804567
    15  0.831365  0.273890  0.130410  0.991311  0.064167  0.715142
    16  0.831365  0.273890  0.130410  0.101777  0.840464  0.760616
    17  0.831365  0.273890  0.130410  0.655391  0.289537  0.391893
    18  0.831365  0.273890  0.130410  0.383729  0.061811  0.773627
    19  0.831365  0.273890  0.130410  0.575711  0.995151  0.804567
    20  0.447640  0.848283  0.627224  0.991311  0.064167  0.715142
    21  0.447640  0.848283  0.627224  0.101777  0.840464  0.760616
    22  0.447640  0.848283  0.627224  0.655391  0.289537  0.391893
    23  0.447640  0.848283  0.627224  0.383729  0.061811  0.773627
    24  0.447640  0.848283  0.627224  0.575711  0.995151  0.804567
    
    0 讨论(0)
提交回复
热议问题