calculate datetime-difference in years, months, etc. in a new pandas dataframe column

后端 未结 7 2279
长发绾君心
长发绾君心 2021-02-07 05:34

I have a pandas dataframe looking like this:

Name    start        end
A       2000-01-10   1970-04-29

I want to add a new column providing the

相关标签:
7条回答
  • 2021-02-07 06:02

    Pretty much straightforward with relativedelta:

    from dateutil import relativedelta
    
    >>          end      start
    >> 0 1970-04-29 2000-01-10
    
    for i in df.index:
        df.at[i, 'diff'] = relativedelta.relativedelta(df.ix[i, 'start'], df.ix[i, 'end'])
    
    >>          end      start                                           diff
    >> 0 1970-04-29 2000-01-10  relativedelta(years=+29, months=+8, days=+12)
    
    0 讨论(0)
  • 2021-02-07 06:08

    With a simple function you can reach your goal.

    The function calculates the years difference and the months difference with a simple calculation.

    import pandas as pd
    import datetime
    
    def parse_date(td):
        resYear = float(td.days)/364.0                   # get the number of years including the the numbers after the dot
        resMonth = int((resYear - int(resYear))*364/30)  # get the number of months, by multiply the number after the dot by 364 and divide by 30.
        resYear = int(resYear)
        return str(resYear) + "Y" + str(resMonth) + "m"
    
    df = pd.DataFrame([("2000-01-10", "1970-04-29")], columns=["start", "end"])
    df["delta"] = [parse_date(datetime.datetime.strptime(start, '%Y-%m-%d') - datetime.datetime.strptime(end, '%Y-%m-%d')) for start, end in zip(df["start"], df["end"])]
    print df
    
            start         end  delta
    0  2000-01-10  1970-04-29  29Y9m
    
    0 讨论(0)
  • 2021-02-07 06:09

    A much simpler way is to use date_range function and calculate length of the same

    startdt=pd.to_datetime('2017-01-01')
    enddt = pd.to_datetime('2018-01-01')
    len(pd.date_range(start=startdt,end=enddt,freq='M'))
    
    0 讨论(0)
  • 2021-02-07 06:14

    You can try the following function to calculate the difference -

    def yearmonthdiff(row):
        s = row['start']
        e = row['end']
        y = s.year - e.year
        m = s.month - e.month
        d = s.day - e.day
        if m < 0:
            y = y - 1
            m = m + 12
        if m == 0:
            if d < 0:
                m = m -1
            elif d == 0:
                s1 = s.hour*3600 + s.minute*60 + s.second
                s2 = e.hour*3600 + e.minut*60 + e.second
                if s1 < s2:
                    m = m - 1
        return '{}y{}m'.format(y,m)
    

    Where row is the dataframe row . I am assuming your start and end columns are datetime objects. Then you can use DataFrame.apply() function to apply it to each row.

    df
    
    Out[92]:
                           start                        end
    0 2000-01-10 00:00:00.000000 1970-04-29 00:00:00.000000
    1 2015-07-18 17:54:59.070381 2014-01-11 17:55:10.053381
    
    df['diff'] = df.apply(yearmonthdiff, axis=1)
    
    In [97]: df
    Out[97]:
                           start                        end   diff
    0 2000-01-10 00:00:00.000000 1970-04-29 00:00:00.000000  29y9m
    1 2015-07-18 17:54:59.070381 2014-01-11 17:55:10.053381   1y6m
    
    0 讨论(0)
  • 2021-02-07 06:19

    Similar to @DeepSpace's answer, here's a SAS-like implementation:

    import pandas as pd
    from dateutil import relativedelta
    
    def intck_month( start, end ):
        rd = relativedelta.relativedelta( pd.to_datetime( end ), pd.to_datetime( start ) )
        return rd.years, rd.months
    

    Usage:

    >> years, months = intck_month('1960-01-01', '1970-03-01')
    >> print(years)
    10
    >> print(months)
    2
    
    0 讨论(0)
  • 2021-02-07 06:26

    You can try by creating a new column with years in this way:

    df['diff_year'] = df['diff'] / np.timedelta64(1, 'Y')
    
    0 讨论(0)
提交回复
热议问题