cartesian product in pandas

前端未结

关注

 11  1808

I have two pandas dataframes:

from pandas import DataFrame
df1 = DataFrame({\'col1\':[1,2],\'col2\':[3,4]})
df2 = DataFrame({\'col3\':[5,6]})

相关标签:

11条回答

醉酒成梦

2020-11-22 00:12

With method chaining:

product = (
    df1.assign(key=1)
    .merge(df2.assign(key=1), on="key")
    .drop("key", axis=1)
)

0 讨论(0)

遥遥无期

2020-11-22 00:12
You can use numpy as it could be faster. Suppose you have two series as follows,
```
s1 = pd.Series(np.random.randn(100,))
s2 = pd.Series(np.random.randn(100,))
```
You just need,
```
pd.DataFrame(
    s1[:, None] @ s2[None, :], 
    index = s1.index, columns = s2.index
)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
滥情空心

2020-11-22 00:13

I find using pandas MultiIndex to be the best tool for the job. If you have a list of lists lists_list, call pd.MultiIndex.from_product(lists_list) and iterate over the result (or use it in DataFrame index).

0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-11-22 00:15
This won't win a code golf competition, and borrows from the previous answers - but clearly shows how the key is added, and how the join works. This creates 2 new data frames from lists, then adds the key to do the cartesian product on.

My use case was that I needed a list of all store IDs on for each week in my list. So, I created a list of all the weeks I wanted to have, then a list of all the store IDs I wanted to map them against.

The merge I chose left, but would be semantically the same as inner in this setup. You can see this in the documentation on merging, which states it does a Cartesian product if key combination appears more than once in both tables - which is what we set up.
```
days = pd.DataFrame({'date':list_of_days})
stores = pd.DataFrame({'store_id':list_of_stores})
stores['key'] = 0
days['key'] = 0
days_and_stores = days.merge(stores, how='left', on = 'key')
days_and_stores.drop('key',1, inplace=True)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

耶瑟儿～

2020-11-22 00:16

Here is a helper function to perform a simple Cartesian product with two data frames. The internal logic handles using an internal key, and avoids mangling any columns that happen to be named "key" from either side.

import pandas as pd

def cartesian(df1, df2):
    """Determine Cartesian product of two data frames."""
    key = 'key'
    while key in df1.columns or key in df2.columns:
        key = '_' + key
    key_d = {key: 0}
    return pd.merge(
        df1.assign(**key_d), df2.assign(**key_d), on=key).drop(key, axis=1)

# Two data frames, where the first happens to have a 'key' column
df1 = pd.DataFrame({'number':[1, 2], 'key':[3, 4]})
df2 = pd.DataFrame({'digit': [5, 6]})
cartesian(df1, df2)

shows:

   number  key  digit
0       1    3      5
1       1    3      6
2       2    4      5
3       2    4      6

0 讨论(0)

上一页 1 2