问题
I have a csv file with two columns containing dates and 0 or 1 like so:
17/08/2012 07:47:16 0
17/08/2012 07:54:31 1
17/08/2012 08:02:31 0
17/08/2012 09:22:33 0
17/08/2012 09:58:05 0
17/08/2012 12:26:59 1
17/08/2012 20:56:00 0
18/08/2012 10:04:06 0
18/08/2012 10:42:52 0
20/08/2012 07:22:02 0
20/08/2012 07:54:28 0
20/08/2012 08:01:58 0
20/08/2012 08:16:31 1
20/08/2012 08:26:38 0
20/08/2012 08:55:19 1
20/08/2012 09:00:09 0
20/08/2012 09:26:11 0
20/08/2012 09:50:10 0
20/08/2012 10:33:37 0
20/08/2012 10:39:13 0
20/08/2012 10:39:35 1
20/08/2012 11:15:07 1
20/08/2012 11:19:15 0
20/08/2012 11:21:01 0
I load this file into a DataFrame raw_data and then change the index to Timestamp :
ts_data=raw_data.set_index(pd.to_datetime(raw_data.when_created,dayfirst=True))
I then try to downsample the data using:
daily_conversions=ts_data.resample('D',how='sum')
It works for all days (more than 7 months ,here i only include a subset) except one day where i get this output:
2012-08-20 NaN
This does not make sense as you can see from the data. The interesting part is that if i downsample using a higher frequency like 'h' i get correct results for that specific day.I get null-values for the hours that are not present 0 for the hourse that are present but only have 0 and a correct sum for the hours that are present but are ==1. Any ideas please?
回答1:
After a helpful comment from above i realised what was wrong. It is just a matter of labelling. So in reality the date that should return NaN is the 19th but the default setting is label='right' so it was showing as the 20th. When i add label='left' it works fine. Thanks
来源:https://stackoverflow.com/questions/15821194/pandas-downsampling-issue