Python Pandas. Date object split by separate columns.

前端 未结 2 1485
长发绾君心
长发绾君心 2021-01-29 11:02

I have dates in Python (pandas) written as \"1/31/2010\". To apply linear regression I want to have 3 separate variables: number of day, number of month, number of year.

相关标签:
2条回答
  • 2021-01-29 11:28

    This answers only your first question

    One solution is to extract attributes of pd.Timestamp objects using operator.attrgetter.

    The benefit of this method is you can easily expand / change the attributes you require. In addition, the logic is not specific to object type.

    from operator import attrgetter
    import pandas as pd
    
    df = pd.DataFrame({'date': ['1/21/2010', '5/5/2015', '4/30/2018']})
    
    df['date'] = pd.to_datetime(df['date'], format='%m/%d/%Y')
    
    attr_list = ['day', 'month', 'year']
    attrs = attrgetter(*attr_list)
    df[attr_list] = df['date'].apply(attrs).apply(pd.Series)
    
    print(df)
    
            date  day  month  year
    0 2010-01-21   21      1  2010
    1 2015-05-05    5      5  2015
    2 2018-04-30   30      4  2018
    
    0 讨论(0)
  • 2021-01-29 11:39
    df['date'] = pd.to_datetime(df['date'])
    
    #Create 3 additional columns
    df['day'] = df['date'].dt.day
    df['month'] = df['date'].dt.month
    df['year'] = df['date'].dt.year
    

    Ideally, you can do this without having to create 3 additional columns, you can just pass the Series to your function.

    In [2]: pd.to_datetime('01/31/2010').day
    Out[2]: 31
    
    In [3]: pd.to_datetime('01/31/2010').month
    Out[3]: 1
    
    In [4]: pd.to_datetime('01/31/2010').year
    Out[4]: 2010
    
    0 讨论(0)
提交回复
热议问题