python del vs pandas drop

后端 未结 4 2230
孤街浪徒
孤街浪徒 2021-02-13 05:45

I know it might be old debate, but out of pandas.drop and python del function which is better in terms of performance over large dataset?

I am

相关标签:
4条回答
  • 2021-02-13 05:50

    tested it on a 10Mb data of stocks, got the following results:

    for drop with the following code

    t=time.time()
    d.drop(labels="2")
    print(time.time()-t)
    

    0.003617525100708008

    for del with the following code on the same column:

    t=time.time()
    del d[2]
    print(time.time()-t)
    

    time i got was:

    0.0045168399810791016

    reruns on different datasets and columns didn't make any significant difference

    0 讨论(0)
  • 2021-02-13 05:58

    Summarizing a few points about functionality:

    • drop operates on both columns and rows; del operates on column only.
    • drop can operate on multiple items at a time; del operates only on one at a time.
    • drop can operate in-place or return a copy; del is an in-place operation only.

    The documentation at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html has more details on drop's features.

    0 讨论(0)
  • 2021-02-13 06:06

    In drop method using "inplace=False" you have option to create Subset DF and keep un-touch the original DF, But in del I believe this option is not available.

    0 讨论(0)
  • 2021-02-13 06:09

    Using randomly generated data of about 1.6 GB, it appears that df.drop is faster than del, especially over multiple columns:

    df = pd.DataFrame(np.random.rand(20000,10000))
    t_1 = time.time()
    df.drop(labels=[2,4,1000], inplace=True)
    t_2 = time.time()
    print(t_2 - t_1)
    

    0.9118959903717041

    Compared to:

    df = pd.DataFrame(np.random.rand(20000,10000))
    t_3 = time.time()
    del df[2]
    del df[4]
    del df[1000]
    t_4 = time.time()
    print(t_4 - t_3)
    

    4.052732944488525

    @Inder's comparison is not quite the same since it doesn't use inplace=True.

    0 讨论(0)
提交回复
热议问题