Add column with number of days between dates in DataFrame pandas

前端 未结 4 543
醉话见心
醉话见心 2020-11-28 03:03

I want to subtract dates in \'A\' from dates in \'B\' and add a new column with the difference.

df
          A        B
one 2014-01-01  2014-02-28 
two 2014-         


        
相关标签:
4条回答
  • 2020-11-28 03:13

    Assuming these were datetime columns (if they're not apply to_datetime) you can just subtract them:

    df['A'] = pd.to_datetime(df['A'])
    df['B'] = pd.to_datetime(df['B'])
    
    In [11]: df.dtypes  # if already datetime64 you don't need to use to_datetime
    Out[11]:
    A    datetime64[ns]
    B    datetime64[ns]
    dtype: object
    
    In [12]: df['A'] - df['B']
    Out[12]:
    one   -58 days
    two   -26 days
    dtype: timedelta64[ns]
    
    In [13]: df['C'] = df['A'] - df['B']
    
    In [14]: df
    Out[14]:
                 A          B        C
    one 2014-01-01 2014-02-28 -58 days
    two 2014-02-03 2014-03-01 -26 days
    

    Note: ensure you're using a new of pandas (e.g. 0.13.1), this may not work in older versions.

    0 讨论(0)
  • 2020-11-28 03:18

    To remove the 'days' text element, you can also make use of the dt() accessor for series: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.html

    So,

    df[['A','B']] = df[['A','B']].apply(pd.to_datetime) #if conversion required
    df['C'] = (df['B'] - df['A']).dt.days
    

    which returns:

                 A          B   C
    one 2014-01-01 2014-02-28  58
    two 2014-02-03 2014-03-01  26
    
    0 讨论(0)
  • 2020-11-28 03:26

    A list comprehension is your best bet for the most Pythonic (and fastest) way to do this:

    [int(i.days) for i in (df.B - df.A)]
    
    1. i will return the timedelta(e.g. '-58 days')
    2. i.days will return this value as a long integer value(e.g. -58L)
    3. int(i.days) will give you the -58 you seek.

    If your columns aren't in datetime format. The shorter syntax would be: df.A = pd.to_datetime(df.A)

    0 讨论(0)
  • 2020-11-28 03:27

    How about this:

    times['days_since'] = max(list(df.index.values))  
    times['days_since'] = times['days_since'] - times['months']  
    times
    
    0 讨论(0)
提交回复
热议问题