df_pairs:
city1 city2
0 sfo yyz
1 sfo yvr
2 sfo dfw
3 sfo ewr
output of df_pairs.to_dict(\'records\'):
Give this a run:
Take out the initial variables, and get rid of the for loop.
a = pd.merge(df_pairs, data_df, left_on='city1', right_on='city', how='left').set_index(['city1', 'city2'])
b = pd.merge(df_pairs, data_df, left_on='city2', right_on='city', how='left').set_index(['city1', 'city2'])
del a['city']
del b['city']
Now do each calculation once and sum across each row (axis=1)
diff_df = b - a
diff_df_sign = np.sign(diff_df)
diff_df_sign_pos = diff_df_sign.clip(lower=0).sum(axis=1)
diff_df_sign_neg = diff_df_sign.clip(upper=0).sum(axis=1)
diff_df_pos = diff_df.clip(lower=0).sum(axis=1)
diff_df_neg = diff_df.clip(upper=0).sum(axis=1)
Does this look like this output you want?
city1 city2
sfo yyz 5
yvr 5
dfw 5
ewr 4
dtype: float64
city1 city2
sfo yyz 0
yvr 0
dfw 0
ewr -1
dtype: float64
city1 city2
sfo yyz 45.83
yvr 45.83
dfw 75.38
ewr 19.55
dtype: float64
city1 city2
sfo yyz 0.0
yvr 0.0
dfw 0.0
ewr -1.1
dtype: float64
Why don't you simply do this:
df_city1 = pd.merge(df_pairs['city1'], data_df, left_on='city1', right_on='city', how='left')
df_city2 = pd.merge(df_pairs['city2'], data_df, left_on='city2', right_on='city', how='left')
diff = df_city2.subtract(df_city1, fill_value=0)
pos_sum = diff[diff >= 0].sum(axis=1)
neg_sum = diff[diff < 0].sum(axis=1)
Instead of looping over all your columns, merging 2*(number of columns) times, not to mention indexing, then that complicated bit with np.sign
and .clip
... Your df_pairs
and data_df
have a one-to-one correspondence, right?