pandas - Extend Index of a DataFrame setting all columns for new rows to NaN?

前端 未结 6 454
日久生厌
日久生厌 2021-02-05 03:58

I have time-indexed data:

df2 = pd.DataFrame({ \'day\': pd.Series([date(2012, 1, 1), date(2012, 1, 3)]), \'b\' : pd.Series([0.22, 0.3]) })
df2 = df2.set_index(\'         


        
相关标签:
6条回答
  • 2021-02-05 04:02

    Mark's answer seems to not be working anymore on pandas 1.1.1.

    However, using the same idea, the following works:

    from datetime import datetime
    import pandas as pd
    
    
    # get start and desired end dates
    first_date = df['date'].min()
    today = datetime.today()
    
    # set index
    df.set_index('date', inplace=True)
    
    # and here is were the magic happens
    idx = pd.date_range(first_date, today, freq='D')
    df = df.reindex(idx)
    

    EDIT: just found out that this exact use case is in the docs:

    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html#pandas.DataFrame.reindex

    0 讨论(0)
  • 2021-02-05 04:05

    Not exactly the question since here you know that the second index is all days in January, but suppose you have another index say from another data frame df1, which might be disjoint and with a random frequency. Then you can do this:

    ix = pd.DatetimeIndex(list(df2.index) + list(df1.index)).unique().sort_values()
    df2.reindex(ix)
    

    Converting indices to lists allows one to create a longer list in a natural way.

    0 讨论(0)
  • 2021-02-05 04:14

    Here's another option: First add a NaN record on the last day you want, then resample. This way resampling will fill the missing dates for you.

    Starting Frame:

    import pandas as pd
    import numpy as np
    from datetime import date
    
    df2 = pd.DataFrame({ 'day': pd.Series([date(2012, 1, 1), date(2012, 1, 3)]), 'b' : pd.Series([0.22, 0.3]) })
    df2= df2.set_index('day')
    df2
    
    Out:
                      b
        day 
        2012-01-01  0.22
        2012-01-03  0.30
    

    Filled Frame:

    df2 = df2.set_value(date(2012,1,31),'b',np.float('nan'))
    df2.asfreq('D')
    
    Out:
                    b
        day 
        2012-01-01  0.22
        2012-01-02  NaN
        2012-01-03  0.30
        2012-01-04  NaN
        2012-01-05  NaN
        2012-01-06  NaN
        2012-01-07  NaN
        2012-01-08  NaN
        2012-01-09  NaN
        2012-01-10  NaN
        2012-01-11  NaN
        2012-01-12  NaN
        2012-01-13  NaN
        2012-01-14  NaN
        2012-01-15  NaN
        2012-01-16  NaN
        2012-01-17  NaN
        2012-01-18  NaN
        2012-01-19  NaN
        2012-01-20  NaN
        2012-01-21  NaN
        2012-01-22  NaN
        2012-01-23  NaN
        2012-01-24  NaN
        2012-01-25  NaN
        2012-01-26  NaN
        2012-01-27  NaN
        2012-01-28  NaN
        2012-01-29  NaN
        2012-01-30  NaN
        2012-01-31  NaN
    
    0 讨论(0)
  • 2021-02-05 04:20

    Use this (current as of pandas 1.1.3):

    ix = pd.date_range(start=date(2012, 1, 1), end=date(2012, 1, 31), freq='D')
    df2.reindex(ix)
    

    Which gives:

                   b
    2012-01-01  0.22
    2012-01-02   NaN
    2012-01-03  0.30
    2012-01-04   NaN
    2012-01-05   NaN
    [...]
    2012-01-29   NaN
    2012-01-30   NaN
    2012-01-31   NaN
    

    For older versions of pandas replace pd.date_range with pd.DatetimeIndex.

    0 讨论(0)
  • 2021-02-05 04:24
    def extendframe(df, ndays):
        """
        (df, ndays) -> df that is padded by ndays in beginning and end
        """
        ixd = df.index - datetime.timedelta(ndays)
        ixu = df.index + datetime.timedelta(ndays)
        ixx = df.index.union(ixd.union(ixu))
        df_ = df.reindex(ixx)
        return df_
    
    0 讨论(0)
  • 2021-02-05 04:26

    You can resample passing day as frequency, without specifying a fill_method parameter missing values will be NaN filled as you desired

    df3 = df2.asfreq('D')
    df3
    
    Out[16]:
                   b
    2012-01-01  0.22
    2012-01-02   NaN
    2012-01-03  0.30
    

    To answer your second part, I can't think of a more elegant way at the moment:

    df3 = DataFrame({ 'day': Series([date(2012, 1, 4), date(2012, 1, 31)])})
    df3.set_index('day',inplace=True)
    merged = df2.append(df3)
    merged = merged.asfreq('D')
    merged
    
    
    Out[46]:
                   b
    2012-01-01  0.22
    2012-01-02   NaN
    2012-01-03  0.30
    2012-01-04   NaN
    2012-01-05   NaN
    2012-01-06   NaN
    2012-01-07   NaN
    2012-01-08   NaN
    2012-01-09   NaN
    2012-01-10   NaN
    2012-01-11   NaN
    2012-01-12   NaN
    2012-01-13   NaN
    2012-01-14   NaN
    2012-01-15   NaN
    2012-01-16   NaN
    2012-01-17   NaN
    2012-01-18   NaN
    2012-01-19   NaN
    2012-01-20   NaN
    2012-01-21   NaN
    2012-01-22   NaN
    2012-01-23   NaN
    2012-01-24   NaN
    2012-01-25   NaN
    2012-01-26   NaN
    2012-01-27   NaN
    2012-01-28   NaN
    2012-01-29   NaN
    2012-01-30   NaN
    2012-01-31   NaN
    

    This constructs a second time series and then we just append and call asfreq('D') as before.

    0 讨论(0)
提交回复
热议问题