How should I get the shape of a dask dataframe?

后端 未结 5 1485
醉话见心
醉话见心 2021-01-03 22:12

Performing .shape is giving me the following error.

AttributeError: \'DataFrame\' object has no attribute \'shape\'

How should

相关标签:
5条回答
  • 2021-01-03 22:18

    Well, I know this is a quite old question, but I had the same issue and I got an out-of-the-box solution which I just want to register here.

    Considering your data, I'm wondering that it is originally saved in a CSV similar file; so, for my situation, I just count the lines of that file (minus one, the header line). Inspired by this answer here, this is the solution I'm using:

    import dask.dataframe as dd
    from itertools import (takewhile,repeat)
     
    def rawincount(filename):
        f = open(filename, 'rb')
        bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024) for _ in repeat(None)))
        return sum( buf.count(b'\n') for buf in bufgen )
    
    filename = 'myHugeDataframe.csv'
    df = dd.read_csv(filename)
    df_shape = (rawincount(filename) - 1, len(df.columns))
    print(f"Shape: {df_shape}")
    

    Hope this could help someone else as well.

    0 讨论(0)
  • 2021-01-03 22:31

    To get the shape we can try this way:

     dask_dataframe.describe().compute()  

    "count" column of the index will give the number of rows

     len(dask_dataframe.columns) 

    this will give the number of columns in the dataframe

    0 讨论(0)
  • 2021-01-03 22:33
    print('(',len(df),',',len(df.columns),')')
    
    0 讨论(0)
  • 2021-01-03 22:34

    With shape you can do the following

    a = df.shape
    a[0].compute(),a[1]
    

    This will shop the shape just as it is shown with pandas

    0 讨论(0)
  • 2021-01-03 22:36

    You can get the number of columns directly

    len(df.columns)  # this is fast
    

    You can also call len on the dataframe itself, though beware that this will trigger a computation.

    len(df)  # this requires a full scan of the data
    

    Dask.dataframe doesn't know how many records are in your data without first reading through all of it.

    0 讨论(0)
提交回复
热议问题