可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a Dataframe, df, with the following column:
df['ArrivalDate'] = ... 936 2012-12-31 938 2012-12-29 965 2012-12-31 966 2012-12-31 967 2012-12-31 968 2012-12-31 969 2012-12-31 970 2012-12-29 971 2012-12-31 972 2012-12-29 973 2012-12-29 ...
The elements of the column are pandas.tslib.Timestamp.
I want to just include the year and month. I thought there would be simple way to do it, but I can't figure it out.
Here's what I've tried:
df['ArrivalDate'].resample('M', how = 'mean')
I got the following error:
Only valid with DatetimeIndex or PeriodIndex
Then I tried:
df['ArrivalDate'].apply(lambda(x):x[:-2])
I got the following error:
'Timestamp' object has no attribute '__getitem__'
Any suggestions?
Edit: I sort of figured it out.
df.index = df['ArrivalDate']
Then, I can resample another column using the index.
But I'd still like a method for reconfiguring the entire column. Any ideas?
回答1:
You can directly access the year
and month
attributes, or request a datetime.datetime
:
In [15]: t = pandas.tslib.Timestamp.now() In [16]: t Out[16]: Timestamp('2014-08-05 14:49:39.643701', tz=None) In [17]: t.to_datetime() Out[17]: datetime.datetime(2014, 8, 5, 14, 49, 39, 643701) In [18]: t.day Out[18]: 5 In [19]: t.month Out[19]: 8 In [20]: t.year Out[20]: 2014
One way to combine year and month is to make an integer encoding them, such as: 201408
for August, 2014. Along a whole column, you could do this as:
df['YearMonth'] = df['ArrivalDate'].map(lambda x: 1000*x.year + x.month)
or many variants thereof.
I'm not a big fan of doing this, though, since it makes date alignment and arithmetic painful later and especially painful for others who come upon your code or data without this same convention. A better way is to choose a day-of-month convention, such as final non-US-holiday weekday, or first day, etc., and leave the data in a date/time format with the chosen date convention.
The calendar
module is useful for obtaining the number value of certain days such as the final weekday. Then you could do something like:
import calendar import datetime df['AdjustedDateToEndOfMonth'] = df['ArrivalDate'].map( lambda x: datetime.datetime( x.year, x.month, max(calendar.monthcalendar(x.year, x.month)[-1][:5]) ) )
If you happen to be looking for a way to solve the simpler problem of just formatting the datetime column into some stringified representation, for that you can just make use of the strftime
function from the datetime.datetime
class, like this:
In [5]: df Out[5]: date_time 0 2014-10-17 22:00:03 In [6]: df.date_time Out[6]: 0 2014-10-17 22:00:03 Name: date_time, dtype: datetime64[ns] In [7]: df.date_time.map(lambda x: x.strftime('%Y-%m-%d')) Out[7]: 0 2014-10-17 Name: date_time, dtype: object
回答2:
If you want new columns showing year and month separately you can do this:
df['year'] = pd.DatetimeIndex(df['ArrivalDate']).year df['month'] = pd.DatetimeIndex(df['ArrivalDate']).month
or...
df['year'] = df['ArrivalDate'].dt.year df['month'] = df['ArrivalDate'].dt.month
Then you can combine them or work with them just as they are.
回答3:
Best way found!!
the date_column
has to be in date time format.
df['month_year'] = df.date_column.dt.to_period('M')
You could also use D
for Day, 2M
for 2 months, 45Min
for 45 min sampling etc.
回答4:
If you want the month year unique pair, using apply is pretty sleek.
df['mnth_yr'] = df['date_column'].apply(lambda x: x.strftime('%B-%Y'))
outputs month-year in one column.
don't forget to first change the format to date-time before, I generally forget :|
df['date_column'] = pd.to_datetime(df['date_column'])
回答5:
Thanks to jaknap32, I wanted to aggregate the results according to Year and Month, so this worked:
df_join['YearMonth'] = df_join['timestamp'].apply(lambda x:x.strftime('%Y%m'))
Output was neat:
0 201108 1 201108 2 201108
回答6:
df['year_month']=df.datetime_column.apply(lambda x: str(x)[:7])
This worked fine for me, didn't think pandas would interpret the resultant string date as date, but when i did the plot, it knew very well my agenda and the string year_month where ordered properly... gotta love pandas!
回答7:
You can first convert your date strings with pandas.to_datetime, which gives you access to all of the numpy datetime and timedelta facilities. For example:
df['ArrivalDate'] = pandas.to_datetime(df['ArrivalDate']) df['Month'] = df['ArrivalDate'].values.astype('datetime64[M]')
回答8:
Extracting the Year say from ['2018-03-04']
df['Year'] = pd.DatetimeIndex(df['date']).year
The df['Year'] creates a new column. While if you want to extract the month just use .month