I know it might be old debate, but out of pandas.drop
and python del
function which is better in terms of performance over large dataset?
I am
Using randomly generated data of about 1.6 GB, it appears that df.drop
is faster than del
, especially over multiple columns:
df = pd.DataFrame(np.random.rand(20000,10000))
t_1 = time.time()
df.drop(labels=[2,4,1000], inplace=True)
t_2 = time.time()
print(t_2 - t_1)
0.9118959903717041
Compared to:
df = pd.DataFrame(np.random.rand(20000,10000))
t_3 = time.time()
del df[2]
del df[4]
del df[1000]
t_4 = time.time()
print(t_4 - t_3)
4.052732944488525
@Inder's comparison is not quite the same since it doesn't use inplace=True
.