I want to drop duplicates and keep the first value. The duplicates that want to be dropped is A = \'df\' .Here\'s my data
A B C D E
qw 1 3 1 1
er
Another idea, with the benefit of being more readable in my opinion, would be to only shift the index where df.A == "df"
and store the ids where the differences are equal to 1. These columns we drop with df.drop()
.
idx = df[df.A == "df"].index # [3, 4, 5, 6, 9, 10]
m = idx - np.roll(idx, 1) == 1 # [False, True, True, True, False, True]
df.drop(idx[m], inplace = True) # [4,5,6,10] <-- These we drop
Time comparison
Runs equally fast as jezrael using the test sample below.
1000 loops, best of 3: 1.38 ms per loop
1000 loops, best of 3: 1.38 ms per loop
Full example
import pandas as pd
import numpy as np
df = pd.DataFrame(
{'A': {0: 'qw', 1: 'er', 2: 'ew', 3: 'df', 4: 'df', 5: 'df', 6: 'df', 7: 'we',
8: 'we', 9: 'df', 10: 'df', 11: 'we', 12: 'qw'},
'B': {0: 1, 1: 2, 2: 4, 3: 34, 4: 2, 5: 3, 6: 4, 7: 2, 8: 4, 9: 34, 10: 3,
11: 4, 12: 2},
'C': {0: 3, 1: 4, 2: 8, 3: 34, 4: 5, 5: 3, 6: 4, 7: 5, 8: 4, 9: 9, 10: 3,
11: 7, 12: 2},
'D': {0: 1, 1: 2, 2: 44, 3: 34, 4: 2, 5: 7, 6: 7, 7: 5, 8: 4, 9: 34, 10: 9,
11: 4, 12: 7},
'E': {0: 1, 1: 6, 2: 4, 3: 34, 4: 2, 5: 3, 6: 4, 7: 2, 8: 4, 9: 34, 10: 3,
11: 4, 12: 2}}
)
idx = df[df.A == "df"].index
m = idx - np.roll(idx, 1) == 1
df.drop(idx[m], inplace = True)