I have a df
time series. I extracted the indexes and want to convert them each to datetime
. How do you go about doing that? I tried to use pa
I had the same issue, and tried the solution from @aikramer2, to add a column to my df of type 'datetime.datetime', but again i got a pandas data type:
#libraries used -
import pandas as pd
import datetime as dt
#loading data into a pandas df, from a local file. note column [1] contains a datetime column -
savedtweets = pd.read_csv('/Users/sharon/Documents/ipython/twitter_analysis/conftwit.csv', sep='\t',
names=['id', 'created_at_string', 'user.screen_name', 'text'],
parse_dates={"created_at" : [1]})
print int(max(savedtweets['id'])) #535073416026816512
print type(savedtweets['created_at'][0]) # result is <class 'pandas.tslib.Timestamp'>
# add a column specifically using datetime.datetime library -
savedtweets['datetime'] = savedtweets['created_at'].apply(lambda x: dt.datetime(x.year,x.month,x.day))
print type(savedtweets['datetime'][0]) # result is <class 'pandas.tslib.Timestamp'>
i suspect pandas df cannot store a datetime.datetime data type. I got success when i made a plain python list to store the datetime.datetime values:
savedtweets = pd.read_csv('/Users/swragg/Documents/ipython/twitter_analysis/conftwit.csv', sep='\t',
names=['id', 'created_at_string', 'user.screen_name', 'text'],
parse_dates={"created_at" : [1]})
print int(max(savedtweets['id'])) #535073416026816512
print type(savedtweets['created_at'][0]) # <class 'pandas.tslib.Timestamp'>
savedtweets_datetime= [dt.datetime(x.year,x.month,x.day,x.hour,x.minute,x.second) for x in savedtweets['created_at']]
print savedtweets_datetime[0] # 2014-11-19 14:13:38
print savedtweets['created_at'][0] # 2014-11-19 14:13:38
print type(dt.datetime(2014,3,5,2,4)) # <type 'datetime.datetime'>
print type(savedtweets['created_at'][0].year) # <type 'int'>
print type(savedtweets_datetime) # <type 'list'>
As an alternative solution if you have two separate fields (one for date; one for time):
Convert to datetime.date
df['date2'] = pd.to_datetime(df['date']).apply(lambda x: x.date())
Convert to datetime.time
df['time2'] = pd.to_datetime(df['time']).apply(lambda x: x.time())
Afterwards you can combine them:
df['datetime'] = df.apply(lambda r : pd.datetime.combine(r['date2'],r['time2']),1)
Adapted this post
import time
time.strftime("%H:%M", time.strptime(str(x), "%Y-%m-%d %H:%M:%S"))
Note: x should be pandas.tslib.Timestamp (as it is in the question)
Just an update to the question, I have tried the most upvoted answer, and it gives me this warning
usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py:2910: FutureWarning: to_datetime is deprecated. Use self.to_pydatetime() exec(code_obj, self.user_global_ns, self.user_ns)
And suggest me to use to_pydatetime()
For example
sample = Timestamp('2018-05-02 10:08:54.774000')
sample.to_datetime()
will return datetime.datetime(2018, 4, 30, 10, 8, 54, 774000)
This works for me, to create date for insert
in MySQL, please try:
pandas_tslib = pandas_tslib.to_pydatetime()
pandas_tslib = "'" + pandas_tslib.strftime('%Y-%m-%d') + "'"
In my case I could not get a correct output even when specifying the format: I used to get always the year 1970.
Actually what solved my problem was to specify the unit
parameter to the function since my timestamps have seconds granularity:
df_new = df
df_new['time'] = pandas.to_datetime(df['time'], unit='s')