i am trying to join two data frames but cannot get my head around the possibilities Python has to offer.
First dataframe:
ID MODEL REQUESTS ORDERS
1 G
I always found merge to be an easy way to do this:
df1.merge(df2[['MODEL', 'MAKE']], how = 'left')
However, I must admit it would not be as short and nice if you wanted to call the new column something else than 'MAKE'.
I think you can use insert with map by Series
created with df2
(if some value in column MODEL
in df2
is missing get NaN
):
df1.insert(2, 'MAKE', df1['MODEL'].map(df2.set_index('MODEL')['MAKE']))
print (df1)
ID MODEL MAKE REQUESTS ORDERS
0 1 Golf Volkswagen 123 4
1 2 Passat NaN 34 5
2 3 Model 3 Tesla 500 8
3 4 M3 BMW 5 0
The join
method acts very similarly to a VLOOKUP. It joins a column in the first dataframe with the index of the second dataframe so you must set MODEL
as the index in the second dataframe and only grab the MAKE
column.
df.join(df1.set_index('MODEL')['MAKE'], on='MODEL')
Take a look at the documentation for join as it actually uses the word VLOOKUP.
Although not in this case, but there might be scenarios where df2 has more than two columns and you would just want to add one out of those to df1 based on a specific column as key. Here is a generic code that you may find useful.
df = pd.merge(df1, df2[['MODEL', 'MAKE']], on = 'MODEL', how = 'left')