Can't drop columns or slice dataframe using dask?

后端 未结 1 1372
独厮守ぢ
独厮守ぢ 2021-01-12 11:17

I am trying to use dask instead of pandas since I have 2.6gb csv file. I load it and I want to drop a column. but it seems that neither the drop method df.drop(\'column\')

相关标签:
1条回答
  • 2021-01-12 11:28

    We implemented the drop method in this PR. This is available as of dask 0.7.0.

    In [1]: import pandas as pd
    
    In [2]: df = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 2, 1]})
    
    In [3]: import dask.dataframe as dd
    
    In [4]: ddf = dd.from_pandas(df, npartitions=2)
    
    In [5]: ddf.drop('y', axis=1).compute()
    Out[5]: 
       x
    0  1
    1  2
    2  3
    

    Previously one could also have used slicing with column names; though of course this can be less attractive if you have many columns.

    In [6]: ddf[['x']].compute()
    Out[6]: 
       x
    0  1
    1  2
    2  3
    
    0 讨论(0)
提交回复
热议问题