问题
I have a code that uses a dataframe to look up a value (P) given it's column label (X):
df_1 = pd.DataFrame({'X': [1,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
df_1['P'] = df_2 .lookup(df_1.index, df_1['X'])
When I give it a label in df_1 but don't include that label in df_2, like this:
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
I get:
KeyError: 'One or more column labels was not found'
How can I skip those ones please, to get:
X P
0 7 NaN
1 2 1
2 3 4
3 1 4
4 1 1
5 2 1
6 1 3
7 3 1
8 2 2
9 1 2
回答1:
get
and default values
def get_lu(df):
def lu(i, j):
return df.get(j, {}).get(i, np.nan)
return lu
[*map(get_lu(df_2), df_1.index, df_1.X)]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
Alternative
[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
All together
df_1.assign(P=[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
Uglier version
df_1.assign(P=[df_2.rename_axis('X', 1).stack().get(x, np.nan) for x in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
回答2:
From the document adding try
...except
result = []
for row, col in zip(df_1.index, df_1.X):
try :
result.append(df_2.loc[row, col])
except :
result.append(np.nan)
result
Out[135]: [nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
回答3:
A tad slower than @piRSquared, but using loc
+ lambda
:
>> df_1['P'] = df_1.apply(lambda x: df_2.loc[x.name, x.values[0]] if x.values[0] in df_2.columns else np.nan, axis=1)
>> df_1
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
回答4:
this answer uses numpy and is fast...
import numpy as np
setup dataframes
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
-
# designate working columns
lookup_cols = [1, 2, 3]
key_col = 'X'
result_col = 'P'
# get key column values as an array
key = df_1[key_col].values
# make an array of nans to hold the lookup results
result = np.full(key.shape[0], np.nan)
# create a boolean array containing only valid lookup indexes
b = np.isin(key, lookup_cols)
# filter df_1 and df_2 with boolean array b
df_1b = df_1[b]
df_2b = df_2[b]
# lookup values using filtered dataframes
lup = df_2b.lookup(df_1b.index, df_1b[key_col])
# put the results into the result array at proper index locations using b
result[b] = lup
# assign the result array to the dataframe result column
df_1[result_col] = result
回答5:
If, rather than the index, I wanted to use another column from df_1, then piRSquared's answer becomes:
df_1 = pd.DataFrame({'M' : ['X','Y','Z','X','Y','F','Y'],
'N' : ['A','C','B','B','A','A','F']})
df_2 = pd.DataFrame({'A' : [1,2,3],
'B' : [4,1,2],
'C' : [2,3,4]},
index = ['X', 'Y', 'Z'])
def get_lu(df):
def lu(i, j):
return df.get(j, {}).get(i, np.nan)
return lu
df_1['O'] = [*map(get_lu(df_2), df_1.M, df_1.N)]
Which gives:
M N O
0 X A 1.0
1 Y C 3.0
2 Z B 2.0
3 X B 4.0
4 Y A 2.0
5 F A NaN
6 Y F NaN
来源:https://stackoverflow.com/questions/53417435/lookup-with-missing-labels