cartesian product in pandas

前端 未结 11 1808
再見小時候
再見小時候 2020-11-21 23:35

I have two pandas dataframes:

from pandas import DataFrame
df1 = DataFrame({\'col1\':[1,2],\'col2\':[3,4]})
df2 = DataFrame({\'col3\':[5,6]})     

相关标签:
11条回答
  • 2020-11-22 00:12

    With method chaining:

    product = (
        df1.assign(key=1)
        .merge(df2.assign(key=1), on="key")
        .drop("key", axis=1)
    )
    
    0 讨论(0)
  • 2020-11-22 00:12

    You can use numpy as it could be faster. Suppose you have two series as follows,

    s1 = pd.Series(np.random.randn(100,))
    s2 = pd.Series(np.random.randn(100,))
    

    You just need,

    pd.DataFrame(
        s1[:, None] @ s2[None, :], 
        index = s1.index, columns = s2.index
    )
    
    0 讨论(0)
  • 2020-11-22 00:13

    I find using pandas MultiIndex to be the best tool for the job. If you have a list of lists lists_list, call pd.MultiIndex.from_product(lists_list) and iterate over the result (or use it in DataFrame index).

    0 讨论(0)
  • 2020-11-22 00:15

    This won't win a code golf competition, and borrows from the previous answers - but clearly shows how the key is added, and how the join works. This creates 2 new data frames from lists, then adds the key to do the cartesian product on.

    My use case was that I needed a list of all store IDs on for each week in my list. So, I created a list of all the weeks I wanted to have, then a list of all the store IDs I wanted to map them against.

    The merge I chose left, but would be semantically the same as inner in this setup. You can see this in the documentation on merging, which states it does a Cartesian product if key combination appears more than once in both tables - which is what we set up.

    days = pd.DataFrame({'date':list_of_days})
    stores = pd.DataFrame({'store_id':list_of_stores})
    stores['key'] = 0
    days['key'] = 0
    days_and_stores = days.merge(stores, how='left', on = 'key')
    days_and_stores.drop('key',1, inplace=True)
    
    0 讨论(0)
  • 2020-11-22 00:16

    Here is a helper function to perform a simple Cartesian product with two data frames. The internal logic handles using an internal key, and avoids mangling any columns that happen to be named "key" from either side.

    import pandas as pd
    
    def cartesian(df1, df2):
        """Determine Cartesian product of two data frames."""
        key = 'key'
        while key in df1.columns or key in df2.columns:
            key = '_' + key
        key_d = {key: 0}
        return pd.merge(
            df1.assign(**key_d), df2.assign(**key_d), on=key).drop(key, axis=1)
    
    # Two data frames, where the first happens to have a 'key' column
    df1 = pd.DataFrame({'number':[1, 2], 'key':[3, 4]})
    df2 = pd.DataFrame({'digit': [5, 6]})
    cartesian(df1, df2)
    

    shows:

       number  key  digit
    0       1    3      5
    1       1    3      6
    2       2    4      5
    3       2    4      6
    
    0 讨论(0)
提交回复
热议问题