My data is absence records from a factory. Some days there are no absences so there is no data or date recorded for that day. However, and where this gets hairy with the o
Actually you were pretty close of what you wanted (assuming I understood correctly the output you seem to be looking for). See my additions to your code above:
import pandas as pd
ts = pd.read_csv('Absentee_Data_2.csv', encoding = 'utf-8',parse_dates=[3],index_col=3,dayfirst=True, sep=",")
idx = pd.date_range('01.01.2009', '12.31.2017')
ts.index = pd.DatetimeIndex(ts.index)
#ts = ts.reindex(idx, fill_value='NaN')
df = pd.DataFrame(index = idx)
df1 = df.join(ts, how='left')
df2 = df1.copy()
df3 = df1.copy()
df4 = df1.copy()
dict1 = {'Description': 'Discipline', 'Instances': 0, 'Shift': '1st Cooks'}
df1 = df1.fillna(dict1)
dict1["Description"] = "Vacation"
df2 = df2.fillna(dict1)
dict1["Shift"] = "2nd Baker"
df3 = df3.fillna(dict1)
dict1["Description"] = "Discipline"
df4 = df4.fillna(dict1)
df_with_duplicates = pd.concat([df1,df2,df3,df4])
final_res = df_with_duplicates.reset_index().drop_duplicates(subset=["index"] + list(dict1.keys())).set_index("index").drop("Unexcused", axis=1)
Basically what you'd add:
ts
(df1
)fillna(dict1)
allows to fill with static values all the NaN in the columnsreset_index
followed by the `set_index("index")Finally a few output:
In [5]: final_res["2013-01-2"]
Out[5]:
Description Instances Shift
index
2013-01-02 Discipline 0.0 1st Cooks
2013-01-02 Vacation 0.0 1st Cooks
2013-01-02 Vacation 0.0 2nd Baker
2013-01-02 Discipline 0.0 2nd Baker
In [6]: final_res["2014-01-2"]
Out[6]:
Description Instances Shift
index
2014-01-02 Discipline 1.0 2nd Baker
2014-01-02 Vacation 2.0 1st Cooks
2014-01-02 Discipline 3.0 2nd Baker
2014-01-02 Vacation 1.0 1st Cooks
1
I think you just have a problem with the use of datetime, this approach worked for me
ts.set_index(['Date'],inplace=True)
ts.index = pd.to_datetime(ts.index,format='%b %d %Y')
d2 = pd.DataFrame(index=pd.date_range('2014-01-01','2014-12-31'))
print ts.join(d2,how='right')