Extracting just Month and Year separately from Pandas Datetime column

后端 未结 11 1618
抹茶落季
抹茶落季 2020-11-22 09:09

I have a Dataframe, df, with the following column:

df[\'ArrivalDate\'] =
...
936   2012-12-31
938   2012-12-29
965   2012-12-31
966   2012-12-31
967   2012-1         


        
相关标签:
11条回答
  • 2020-11-22 09:41

    You can directly access the year and month attributes, or request a datetime.datetime:

    In [15]: t = pandas.tslib.Timestamp.now()
    
    In [16]: t
    Out[16]: Timestamp('2014-08-05 14:49:39.643701', tz=None)
    
    In [17]: t.to_pydatetime() #datetime method is deprecated
    Out[17]: datetime.datetime(2014, 8, 5, 14, 49, 39, 643701)
    
    In [18]: t.day
    Out[18]: 5
    
    In [19]: t.month
    Out[19]: 8
    
    In [20]: t.year
    Out[20]: 2014
    

    One way to combine year and month is to make an integer encoding them, such as: 201408 for August, 2014. Along a whole column, you could do this as:

    df['YearMonth'] = df['ArrivalDate'].map(lambda x: 100*x.year + x.month)
    

    or many variants thereof.

    I'm not a big fan of doing this, though, since it makes date alignment and arithmetic painful later and especially painful for others who come upon your code or data without this same convention. A better way is to choose a day-of-month convention, such as final non-US-holiday weekday, or first day, etc., and leave the data in a date/time format with the chosen date convention.

    The calendar module is useful for obtaining the number value of certain days such as the final weekday. Then you could do something like:

    import calendar
    import datetime
    df['AdjustedDateToEndOfMonth'] = df['ArrivalDate'].map(
        lambda x: datetime.datetime(
            x.year,
            x.month,
            max(calendar.monthcalendar(x.year, x.month)[-1][:5])
        )
    )
    

    If you happen to be looking for a way to solve the simpler problem of just formatting the datetime column into some stringified representation, for that you can just make use of the strftime function from the datetime.datetime class, like this:

    In [5]: df
    Out[5]: 
                date_time
    0 2014-10-17 22:00:03
    
    In [6]: df.date_time
    Out[6]: 
    0   2014-10-17 22:00:03
    Name: date_time, dtype: datetime64[ns]
    
    In [7]: df.date_time.map(lambda x: x.strftime('%Y-%m-%d'))
    Out[7]: 
    0    2014-10-17
    Name: date_time, dtype: object
    
    0 讨论(0)
  • 2020-11-22 09:43

    If you want new columns showing year and month separately you can do this:

    df['year'] = pd.DatetimeIndex(df['ArrivalDate']).year
    df['month'] = pd.DatetimeIndex(df['ArrivalDate']).month
    

    or...

    df['year'] = df['ArrivalDate'].dt.year
    df['month'] = df['ArrivalDate'].dt.month
    

    Then you can combine them or work with them just as they are.

    0 讨论(0)
  • 2020-11-22 09:45

    @KieranPC's solution is the correct approach for Pandas, but is not easily extendible for arbitrary attributes. For this, you can use getattr within a generator comprehension and combine using pd.concat:

    # input data
    list_of_dates = ['2012-12-31', '2012-12-29', '2012-12-30']
    df = pd.DataFrame({'ArrivalDate': pd.to_datetime(list_of_dates)})
    
    # define list of attributes required    
    L = ['year', 'month', 'day', 'dayofweek', 'dayofyear', 'weekofyear', 'quarter']
    
    # define generator expression of series, one for each attribute
    date_gen = (getattr(df['ArrivalDate'].dt, i).rename(i) for i in L)
    
    # concatenate results and join to original dataframe
    df = df.join(pd.concat(date_gen, axis=1))
    
    print(df)
    
      ArrivalDate  year  month  day  dayofweek  dayofyear  weekofyear  quarter
    0  2012-12-31  2012     12   31          0        366           1        4
    1  2012-12-29  2012     12   29          5        364          52        4
    2  2012-12-30  2012     12   30          6        365          52        4
    
    0 讨论(0)
  • 2020-11-22 09:45

    There is two steps to extract year for all the dataframe without using method apply.

    Step1

    convert the column to datetime :

    df['ArrivalDate']=pd.to_datetime(df['ArrivalDate'], format='%Y-%m-%d')
    

    Step2

    extract the year or the month using DatetimeIndex() method

     pd.DatetimeIndex(df['ArrivalDate']).year
    
    0 讨论(0)
  • 2020-11-22 09:47

    You can first convert your date strings with pandas.to_datetime, which gives you access to all of the numpy datetime and timedelta facilities. For example:

    df['ArrivalDate'] = pandas.to_datetime(df['ArrivalDate'])
    df['Month'] = df['ArrivalDate'].values.astype('datetime64[M]')
    
    0 讨论(0)
提交回复
热议问题