Pandas Vectorized Date Offset Operations with Vector of Differing Offsets

前端 未结 2 1160
既然无缘
既然无缘 2021-01-11 22:56

I am trying to do the following but is seems that vectorized operations in this mode are not supported.

import pandas as pd
df=pd.DataFrame([[2017,1,15,1],
          


        
相关标签:
2条回答
  • 2021-01-11 23:01

    Consider the following approach:

    In [94]: df['date'] = pd.to_datetime(df[['year','month','day']])
    
    In [95]: df['date_offset'] = df.apply(lambda x: x['date'] + pd.offsets.MonthEnd(x['month_offset']), axis=1)
    
    In [96]: df
    Out[96]:
       year  month  day  month_offset       date date_offset
    0  2017      1   15             1 2017-01-15  2017-01-31
    1  2017      1   15             2 2017-01-15  2017-02-28
    2  2017      1   15             3 2017-01-15  2017-03-31
    3  2017      1   15             4 2017-01-15  2017-04-30
    4  2017      1   15             5 2017-01-15  2017-05-31
    5  2017      1   15             6 2017-01-15  2017-06-30
    6  2017      1   15             7 2017-01-15  2017-07-31
    
    0 讨论(0)
  • 2021-01-11 23:02

    A truly vectorized way to do this is to construct an array of numpy.timedelta64 from month_offset, add this to the array of dates, then subtract numpy.timedelta64(1, 'D') to go back to the last day of the previous month.

    Solutions using apply(lambda) are likely to be much slower. And as the warning said, some Pandas date offset operations are not vectorized. If your data are large, it's better to avoid them. The NumPy facilities like busday_offset() and timedelta64 are fully performant.

    0 讨论(0)
提交回复
热议问题