I have dates in Python (pandas) written as \"1/31/2010\". To apply linear regression I want to have 3 separate variables: number of day, number of month, number of year.
This answers only your first question
One solution is to extract attributes of pd.Timestamp
objects using operator.attrgetter
.
The benefit of this method is you can easily expand / change the attributes you require. In addition, the logic is not specific to object type.
from operator import attrgetter
import pandas as pd
df = pd.DataFrame({'date': ['1/21/2010', '5/5/2015', '4/30/2018']})
df['date'] = pd.to_datetime(df['date'], format='%m/%d/%Y')
attr_list = ['day', 'month', 'year']
attrs = attrgetter(*attr_list)
df[attr_list] = df['date'].apply(attrs).apply(pd.Series)
print(df)
date day month year
0 2010-01-21 21 1 2010
1 2015-05-05 5 5 2015
2 2018-04-30 30 4 2018
df['date'] = pd.to_datetime(df['date'])
#Create 3 additional columns
df['day'] = df['date'].dt.day
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year
Ideally, you can do this without having to create 3 additional columns, you can just pass the Series
to your function.
In [2]: pd.to_datetime('01/31/2010').day
Out[2]: 31
In [3]: pd.to_datetime('01/31/2010').month
Out[3]: 1
In [4]: pd.to_datetime('01/31/2010').year
Out[4]: 2010