I have a series within a DataFrame that I read in initially as an object, and then need to convert it to a date in the form of yyyy-mm-dd where dd is the end of the month.
Agreed that root offers is the right method. However, readers who blindly use MonthEnd(1)
are in for a surprise if they use the last date of the month as an input:
In [4]: pd.Timestamp('2014-01-01')+MonthEnd(1)
Out[4]: Timestamp('2014-01-31 00:00:00')
In [5]: pd.Timestamp('2014-01-31')+MonthEnd(1)
Out[5]: Timestamp('2014-02-28 00:00:00')
Using MonthEnd(0)
instead gives this:
In [7]: pd.Timestamp('2014-01-01')+MonthEnd(0)
Out[7]: Timestamp('2014-01-31 00:00:00')
In [8]: pd.Timestamp('2014-01-31')+MonthEnd(0)
Out[8]: Timestamp('2014-01-31 00:00:00')
#Additional Example
from pandas.tseries.offsets import MonthEnd
# Month End of Current Time's Month in String Format
(pd.Timestamp.now()+MonthEnd(0)).strftime('%Y-%m-%dT00:00:00')
You can use pandas.tseries.offsets.MonthEnd
:
from pandas.tseries.offsets import MonthEnd
df['Date'] = pd.to_datetime(df['Date'], format="%Y%m") + MonthEnd(1)
The 1
in MonthEnd
just specifies to move one step forward to the next date that's a month end. (Using 0
or leaving it blank would also work in your case). If you wanted the last day of the next month, you'd use MonthEnd(2)
, etc. This should work for any month, so you don't need to know the number days in the month, or anything like that. More offset information can be found in the documentation.
Example usage and output:
df = pd.DataFrame({'Date': [200104, 200508, 201002, 201602, 199912, 200611]})
df['EndOfMonth'] = pd.to_datetime(df['Date'], format="%Y%m") + MonthEnd(1)
Date EndOfMonth
0 200104 2001-04-30
1 200508 2005-08-31
2 201002 2010-02-28
3 201602 2016-02-29
4 199912 1999-12-31
5 200611 2006-11-30