Let\'s say I have a table that has dates and a value for each date (plus other columns). I can find the rows that have the same value on the same day by using
Brute forcing this:
df_data = df_data.sort_values(['DAY','VALUE'])
df_data['Dup'] = False
prev_row = pd.Series()
prev_idx = None
for idx, row in df_data.iterrows():
if not prev_row.empty:
if (abs(row['DAY'] - prev_row['DAY']) <=2) & \
(abs(row['VALUE'] - prev_row['VALUE']) <=10):
df_data['Dup'][idx] = True
df_data['Dup'][prev_idx] = True
prev_row, prev_idx = row, idx
print df_data
gives:
DAY MTH YYY VALUE Dup
3 2 10 2016 50.00 False
2 6 11 2016 28.25 False
13 8 9 2016 16.00 True
15 8 11 2016 16.00 True
14 9 10 2016 16.00 True
12 13 11 2016 160.00 True
10 13 9 2016 170.00 True
11 13 10 2016 170.00 True
16 16 11 2016 25.00 False
17 21 11 2016 45.00 False
0 22 9 2016 8.25 False
1 22 9 2016 43.00 False
5 23 10 2016 30.00 False
18 23 9 2016 50.00 True
19 23 10 2016 50.00 True
20 23 11 2016 50.00 True
4 23 11 2016 90.00 False
6 24 8 2016 10.00 True
7 24 9 2016 10.00 True
8 24 10 2016 10.00 True
9 24 11 2016 10.00 True
Is this the desired outcome?