Creating a zero-filled pandas data frame

前端 未结 6 1664
情书的邮戳
情书的邮戳 2020-12-07 14:36

What is the best way to create a zero-filled pandas data frame of a given size?

I have used:

zero_data = np.zeros(shape=(len(data),len(feature_list)         


        
相关标签:
6条回答
  • 2020-12-07 14:47

    If you already have a dataframe, this is the fastest way:

    In [1]: columns = ["col{}".format(i) for i in range(10)]
    In [2]: orig_df = pd.DataFrame(np.ones((10, 10)), columns=columns)
    In [3]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
    10000 loops, best of 3: 60.2 µs per loop
    

    Compare to:

    In [4]: %timeit d = pd.DataFrame(0, index = np.arange(10), columns=columns)
    10000 loops, best of 3: 110 µs per loop
    
    In [5]: temp = np.zeros((10, 10))
    In [6]: %timeit d = pd.DataFrame(temp, columns=columns)
    10000 loops, best of 3: 95.7 µs per loop
    
    0 讨论(0)
  • 2020-12-07 14:54

    Assuming having a template DataFrame, which one would like to copy with zero values filled here...

    If you have no NaNs in your data set, multiplying by zero can be significantly faster:

    In [19]: columns = ["col{}".format(i) for i in xrange(3000)]                                                                                       
    
    In [20]: indices = xrange(2000)
    
    In [21]: orig_df = pd.DataFrame(42.0, index=indices, columns=columns)
    
    In [22]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
    100 loops, best of 3: 12.6 ms per loop
    
    In [23]: %timeit d = orig_df * 0.0
    100 loops, best of 3: 7.17 ms per loop
    

    Improvement depends on DataFrame size, but never found it slower.

    And just for the heck of it:

    In [24]: %timeit d = orig_df * 0.0 + 1.0
    100 loops, best of 3: 13.6 ms per loop
    
    In [25]: %timeit d = pd.eval('orig_df * 0.0 + 1.0')
    100 loops, best of 3: 8.36 ms per loop
    

    But:

    In [24]: %timeit d = orig_df.copy()
    10 loops, best of 3: 24 ms per loop
    

    EDIT!!!

    Assuming you have a frame using float64, this will be the fastest by a huge margin! It is also able to generate any value by replacing 0.0 to the desired fill number.

    In [23]: %timeit d = pd.eval('orig_df > 1.7976931348623157e+308 + 0.0')
    100 loops, best of 3: 3.68 ms per loop
    

    Depending on taste, one can externally define nan, and do a general solution, irrespective of the particular float type:

    In [39]: nan = np.nan
    In [40]: %timeit d = pd.eval('orig_df > nan + 0.0')
    100 loops, best of 3: 4.39 ms per loop
    
    0 讨论(0)
  • 2020-12-07 14:57

    It's best to do this with numpy in my opinion

    import numpy as np
    import pandas as pd
    d = pd.DataFrame(np.zeros((N_rows, N_cols)))
    
    0 讨论(0)
  • 2020-12-07 15:03

    Similar to @Shravan, but without the use of numpy:

      height = 10
      width = 20
      df_0 = pd.DataFrame(0, index=range(height), columns=range(width))
    

    Then you can do whatever you want with it:

    post_instantiation_fcn = lambda x: str(x)
    df_ready_for_whatever = df_0.applymap(post_instantiation_fcn)
    
    0 讨论(0)
  • 2020-12-07 15:08

    You can try this:

    d = pd.DataFrame(0, index=np.arange(len(data)), columns=feature_list)
    
    0 讨论(0)
  • 2020-12-07 15:08

    If you would like the new data frame to have the same index and columns as an existing data frame, you can just multiply the existing data frame by zero:

    df_zeros = df * 0
    
    0 讨论(0)
提交回复
热议问题