From pd.date_range(\'2016-01\', \'2016-05\', freq=\'M\', ).strftime(\'%Y-%m\')
, the last month is 2016-04
, but I was expecting it to be 2016
The explanation for this issue is that the function pd.to_datetime()
converts a '%Y-%m'
date string by default to the first of the month datetime, or '%Y-%m-01'
:
>>> pd.to_datetime('2016-05')
Timestamp('2016-05-01 00:00:00')
>>> pd.date_range('2016-01', '2016-02')
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
'2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08',
'2016-01-09', '2016-01-10', '2016-01-11', '2016-01-12',
'2016-01-13', '2016-01-14', '2016-01-15', '2016-01-16',
'2016-01-17', '2016-01-18', '2016-01-19', '2016-01-20',
'2016-01-21', '2016-01-22', '2016-01-23', '2016-01-24',
'2016-01-25', '2016-01-26', '2016-01-27', '2016-01-28',
'2016-01-29', '2016-01-30', '2016-01-31', '2016-02-01'],
dtype='datetime64[ns]', freq='D')
Then everything follows from that. Specifying freq='M'
includes month ends between 2016-01-01 and 2016-05-01, which is the list you receive and excludes 2016-05-31. But specifying month starts 'MS'
like the second answer provides, includes 2016-05-01 as it falls within the range. pd.date_range()
default behavior isn't like the range
method since ends are included. From the docs:
closed controls whether to include start and end that are on the boundary. The default includes boundary points on either end.