Pandas Reindex to Fill Missing Dates, or Better Method to Fill?

前端未结

关注

 2  1822

My data is absence records from a factory. Some days there are no absences so there is no data or date recorded for that day. However, and where this gets hairy with the o

相关标签:

2条回答

天涯浪人

2021-01-20 03:18

Actually you were pretty close of what you wanted (assuming I understood correctly the output you seem to be looking for). See my additions to your code above:

import pandas as pd

ts = pd.read_csv('Absentee_Data_2.csv', encoding = 'utf-8',parse_dates=[3],index_col=3,dayfirst=True, sep=",")

idx =  pd.date_range('01.01.2009', '12.31.2017')

ts.index = pd.DatetimeIndex(ts.index)
#ts = ts.reindex(idx, fill_value='NaN')
df = pd.DataFrame(index = idx)
df1 = df.join(ts, how='left')
df2 = df1.copy()
df3 = df1.copy()
df4 = df1.copy()
dict1 = {'Description': 'Discipline', 'Instances': 0, 'Shift': '1st Cooks'}
df1 = df1.fillna(dict1)
dict1["Description"] = "Vacation"
df2 = df2.fillna(dict1)
dict1["Shift"] = "2nd Baker"
df3 = df3.fillna(dict1)
dict1["Description"] = "Discipline"
df4 = df4.fillna(dict1)
df_with_duplicates = pd.concat([df1,df2,df3,df4])
final_res = df_with_duplicates.reset_index().drop_duplicates(subset=["index"] + list(dict1.keys())).set_index("index").drop("Unexcused", axis=1)

Basically what you'd add:

Copy 4 times the almost empty df created with ts (df1)
fillna(dict1) allows to fill with static values all the NaN in the columns
Concatenate the 4 dfs, we still need to remove some duplicates as the original values from the csv are duplicated 4 times
Drop the duplicates, we need the index to keep the values added, thus the reset_index followed by the `set_index("index")
Finally drop the Unexcused column

Finally a few output:

In [5]: final_res["2013-01-2"]
Out[5]: 
           Description  Instances      Shift
index                                       
2013-01-02  Discipline        0.0  1st Cooks
2013-01-02    Vacation        0.0  1st Cooks
2013-01-02    Vacation        0.0  2nd Baker
2013-01-02  Discipline        0.0  2nd Baker

In [6]: final_res["2014-01-2"]
Out[6]: 
           Description  Instances       Shift
index                                        
2014-01-02  Discipline        1.0   2nd Baker
2014-01-02    Vacation        2.0   1st Cooks
2014-01-02  Discipline        3.0   2nd Baker
2014-01-02    Vacation        1.0   1st Cooks
1

0 讨论(0)

时光说笑

2021-01-20 03:26

I think you just have a problem with the use of datetime, this approach worked for me

ts.set_index(['Date'],inplace=True)
ts.index = pd.to_datetime(ts.index,format='%b %d %Y')
d2 = pd.DataFrame(index=pd.date_range('2014-01-01','2014-12-31'))

print ts.join(d2,how='right')

0 讨论(0)