Which is the fastest way to extract day, month and year from a given date?

后端 未结 2 582
臣服心动
臣服心动 2020-11-27 19:10

I read a csv file containing 150,000 lines into a pandas dataframe. This dataframe has a field, Date, with the dates in yyyy-mm-dd format. I want t

相关标签:
2条回答
  • 2020-11-27 19:44

    I use below code which works very well for me

    df['Year']=[d.split('-')[0] for d in df.Date]
    df['Month']=[d.split('-')[1] for d in df.Date]
    df['Day']=[d.split('-')[2] for d in df.Date]
    
    df.head(5)
    
    0 讨论(0)
  • 2020-11-27 19:58

    In 0.15.0 you will be able to use the new .dt accessor to do this nice syntactically.

    In [36]: df = DataFrame(date_range('20000101',periods=150000,freq='H'),columns=['Date'])
    
    In [37]: df.head(5)
    Out[37]: 
                     Date
    0 2000-01-01 00:00:00
    1 2000-01-01 01:00:00
    2 2000-01-01 02:00:00
    3 2000-01-01 03:00:00
    4 2000-01-01 04:00:00
    
    [5 rows x 1 columns]
    
    In [38]: %timeit f(df)
    10 loops, best of 3: 22 ms per loop
    
    In [39]: def f(df):
        df = df.copy()
        df['Year'] = DatetimeIndex(df['Date']).year
        df['Month'] = DatetimeIndex(df['Date']).month
        df['Day'] = DatetimeIndex(df['Date']).day
        return df
       ....: 
    
    In [40]: f(df).head()
    Out[40]: 
                     Date  Year  Month  Day
    0 2000-01-01 00:00:00  2000      1    1
    1 2000-01-01 01:00:00  2000      1    1
    2 2000-01-01 02:00:00  2000      1    1
    3 2000-01-01 03:00:00  2000      1    1
    4 2000-01-01 04:00:00  2000      1    1
    
    [5 rows x 4 columns]
    

    From 0.15.0 on (release in end of Sept 2014), the following is now possible with the new .dt accessor:

    df['Year'] = df['Date'].dt.year
    df['Month'] = df['Date'].dt.month
    df['Day'] = df['Date'].dt.day
    
    0 讨论(0)
提交回复
热议问题