问题
I have a df that contains one line per item with a range of dates, and I need to expand it to contain one row per day per item.
It looks like this:
from to id
1 25/02/2019 27/02/2019 A
2 15/07/2019 16/07/2019 B
And I want this:
date id
1 25/02/2019 A
2 26/07/2019 A
3 27/07/2019 A
4 15/07/2019 B
5 16/07/2019 B
I managed to write a code that works but it takes over one hour to run, so I am wondering if there is a more efficient way to do it.
My code:
df_dates = pd.DataFrame()
for i in range(len(df)):
start = df.loc[i]['from']
end = df.loc[i]['to'] + np.timedelta64(1,'D') #includes last day of the range
dates = np.arange(start, end, dtype='datetime64[D]')
temp = pd.DataFrame()
temp = temp.append([df.loc[i]]*len(dates), ignore_index=True)
temp['datadate'] = dates
df_dates = df_dates.append(temp, ignore_index=True)
It takes long because the real ranges are of about 50 years with over 1700 items so the new df is massive, but maybe you know a trick to do the same faster :)
回答1:
Try:
df['from'] = pd.to_datetime(df['from'])
df['to'] = pd.to_datetime(df['to'])
pd.concat([pd.DataFrame({'date': pd.date_range(row['from'], row['to'], freq='D'), 'id': row['id']})
for i, row in df.iterrows()], ignore_index=True)
date id
0 2019-02-25 A
1 2019-02-26 A
2 2019-02-27 A
3 2019-07-15 B
4 2019-07-16 B
回答2:
You can first convert columns with dates to_datetime. Then use itertuples and date_range with concat for creating new expanding DataFrame
:
df['from1'] = pd.to_datetime(df['from'])
df['to1'] = pd.to_datetime(df['to'])
L = [pd.Series(r.id, pd.date_range(r.from1, r.to1)) for r in df.itertuples()]
df1 = pd.concat(L).reset_index()
df1.columns = ['date','id']
print (df1)
date id
0 2019-02-25 A
1 2019-02-26 A
2 2019-02-27 A
3 2019-07-15 B
4 2019-07-16 B
来源:https://stackoverflow.com/questions/60148160/expand-df-with-range-of-dates-to-one-row-per-day