问题
In SQL, you can join two tables without a key so that all records of both tables merge with each other. If pandas.concat()
or pandas.merge()
or some other pandas syntax supported this, it could help me with one step of a problem I am trying to solve. I found an outer join option on the help documentation, but I could not find an exact syntax to do what I wanted (join all records without a key).
To explain this a little better:
import pandas as pd
lunchmenupairs2 = [["pizza", "italian"],["lasagna", "italian"],["orange", "fruit"]]
teamcuisinepreferences2 = [["ian", "*"]]
lunchLabels = ["Food", "Type"]
teamLabels = ["Person", "Type"]
df1 = pd.DataFrame.from_records(lunchmenupairs2, columns=lunchLabels)
df2 = pd.DataFrame.from_records(teamcuisinepreferences2, columns=teamLabels)
print(df1)
print(df2)
Outputs these tables:
Food Type
0 pizza italian
1 lasagna italian
2 orange fruit
Person Type
0 ian *
I want the final result of the merge to be:
Person Type Food Type
0 ian * pizza italian
1 ian * lasagna italian
2 ian * orange fruit
Then I can easily drop the columns I don't want and move to the next step in the code I am working on. This doesn't work:
merged_data = pd.merge(left=df2,right=df1, how='outer')
Is there a way to do this type of DataFrame
merging?
回答1:
You can add a column to both dfs with a constant value,
>>>df1['joincol'] = 1
>>>df2['joincol'] = 1
>>>pd.merge(left=df2,right=df1, on='joincol', how='outer')
Person Type_x joincol Food Type_y
0 ian * 1 pizza italian
1 ian * 1 lasagna italian
2 ian * 1 orange fruit
then delete it afterward when you remove your other undesired columns.
回答2:
I'm sure there's a better approach than the while
method below, but this ought to do the trick:
df2_copy = df2.copy(deep=True)
# Basically repeat the same row in df2 until the lengths of the two dataframes are equal
while df2.shape[0] < df1.shape[0]:
df2 = df2.append(df2_copy)
# Join on the indices of each dataframe ([0,1,2] to [0,1,2])
together = df1.join(df2, lsuffix='_df1', rsuffix='_df2').drop_duplicates()
# Fill 'ian' down
together['Person'] = together['Person'].ffill(axis=0)
# Fill type2 down
together['Type_df2'] = together['Type_df2'].ffill(axis=0)
together
is now:
Food Type_df1 Person Type_df2
0 pizza italian ian *
1 lasagna italian ian *
2 orange fruit ian *
回答3:
Building on @EFT answer, I often need some combo of values and dates, solution below. Can easily be generalized.
df1=pd.DataFrame({'ticker':['a','b']})
df2=pd.DataFrame({'date':pd.date_range('2010-01-01','2010-03-01',freq='1M')})
pd.DataFrame({'ticker':df1['ticker'].unique(),'key':np.nan}).merge(pd.DataFrame({'date':df2['date'].unique(),'key':np.nan}),on='key').drop('key',1)
ticker date
0 a 2010-01-31
1 a 2010-02-28
2 b 2010-01-31
3 b 2010-02-28
回答4:
This is possible with cross-join
, introduced in Pandas 1.2.0. Simply run:
df1.merge(df2, how='cross')
来源:https://stackoverflow.com/questions/44201946/how-to-do-pandas-equivalence-of-sql-outer-join-without-a-key