Pandas Reindex to Fill Missing Dates, or Better Method to Fill?

前端 未结 2 1822
小鲜肉
小鲜肉 2021-01-20 02:34

My data is absence records from a factory. Some days there are no absences so there is no data or date recorded for that day. However, and where this gets hairy with the o

相关标签:
2条回答
  • 2021-01-20 03:18

    Actually you were pretty close of what you wanted (assuming I understood correctly the output you seem to be looking for). See my additions to your code above:

    import pandas as pd
    
    ts = pd.read_csv('Absentee_Data_2.csv', encoding = 'utf-8',parse_dates=[3],index_col=3,dayfirst=True, sep=",")
    
    idx =  pd.date_range('01.01.2009', '12.31.2017')
    
    ts.index = pd.DatetimeIndex(ts.index)
    #ts = ts.reindex(idx, fill_value='NaN')
    df = pd.DataFrame(index = idx)
    df1 = df.join(ts, how='left')
    df2 = df1.copy()
    df3 = df1.copy()
    df4 = df1.copy()
    dict1 = {'Description': 'Discipline', 'Instances': 0, 'Shift': '1st Cooks'}
    df1 = df1.fillna(dict1)
    dict1["Description"] = "Vacation"
    df2 = df2.fillna(dict1)
    dict1["Shift"] = "2nd Baker"
    df3 = df3.fillna(dict1)
    dict1["Description"] = "Discipline"
    df4 = df4.fillna(dict1)
    df_with_duplicates = pd.concat([df1,df2,df3,df4])
    final_res = df_with_duplicates.reset_index().drop_duplicates(subset=["index"] + list(dict1.keys())).set_index("index").drop("Unexcused", axis=1)
    

    Basically what you'd add:

    • Copy 4 times the almost empty df created with ts (df1)
    • fillna(dict1) allows to fill with static values all the NaN in the columns
    • Concatenate the 4 dfs, we still need to remove some duplicates as the original values from the csv are duplicated 4 times
    • Drop the duplicates, we need the index to keep the values added, thus the reset_index followed by the `set_index("index")
    • Finally drop the Unexcused column

    Finally a few output:

    In [5]: final_res["2013-01-2"]
    Out[5]: 
               Description  Instances      Shift
    index                                       
    2013-01-02  Discipline        0.0  1st Cooks
    2013-01-02    Vacation        0.0  1st Cooks
    2013-01-02    Vacation        0.0  2nd Baker
    2013-01-02  Discipline        0.0  2nd Baker
    
    In [6]: final_res["2014-01-2"]
    Out[6]: 
               Description  Instances       Shift
    index                                        
    2014-01-02  Discipline        1.0   2nd Baker
    2014-01-02    Vacation        2.0   1st Cooks
    2014-01-02  Discipline        3.0   2nd Baker
    2014-01-02    Vacation        1.0   1st Cooks
    1
    
    0 讨论(0)
  • 2021-01-20 03:26

    I think you just have a problem with the use of datetime, this approach worked for me

    ts.set_index(['Date'],inplace=True)
    ts.index = pd.to_datetime(ts.index,format='%b %d %Y')
    d2 = pd.DataFrame(index=pd.date_range('2014-01-01','2014-12-31'))
    
    print ts.join(d2,how='right')
    
    0 讨论(0)
提交回复
热议问题