I\'m trying to create a PostgreSQL table of 30-minute data for the S&P 500 ETF (spy30new, for testing freshly inserted data) from a table of several stocks with 15-minute da
Actually, this was my data frame.
Biomass Fossil Brown coal/Lignite Fossil Coal-derived gas Fossil Gas Fossil Hard coal Fossil Oil Geothermal Hydro Pumped Storage Hydro Run-of-river and poundage Hydro Water Reservoir Nuclear Other Other renewable Solar Waste Wind Offshore Wind Onshore
2018-02-02 00:00:00+01:00 4835.0 16275.0 446.0 1013.0 4071.0 155.0 5.0 7.0 1906.0 35.0 8924.0 3643.0 142.0 0.0 595.0 2517.0 19999.0
2018-02-02 00:15:00+01:00 4834.0 16272.0 446.0 1010.0 3983.0 155.0 5.0 7.0 1908.0 71.0 8996.0 3878.0 142.0 0.0 594.0 2364.0 19854.0
2018-02-02 00:30:00+01:00 4828.0 16393.0 446.0 1019.0 4015.0 155.0 5.0
I was trying to insert into SQL database but getting the same error as in the above question. What i have done is, convert the index of the data frame to the column with a label 'index'.
df.reset_index(level=0, inplace=True)
Rename the column name 'index' to 'DateTime' by using this code.
df = df.rename(columns={'index': 'DateTime'})
Change the datatype to the 'datetime64'.
df['DateTime'] = df['DateTime'].astype('datetime64')
Store it in the sql database using these code.
engine = create_engine('mysql+mysqlconnector://root:Password@localhost/generation_data', echo=True)
df.to_sql(con=engine, name='test', if_exists='replace')
Using pd.to_datetime() on each element worked. Option 4, which doesn't work, applies pd.to_datetime() to the entire series. Perhaps the Postgres driver understands python datetime, but not datetime64 in pandas & numpy. Option 4 produced the correct output, but I got ValueError (see title) when sending the DF to Postgres
timesAsPyDt = (spy0030Df['dt']).apply(lambda d: pd.to_datetime(str(d)))
I had the same problem and applying pd.to_datetime()
on each element worked as well. But it is orders of magnitude slower than running pd.to_datetime()
on the entire series. For a dataframe with over a 1 million rows:
(df['Time']).apply(lambda d: pd.to_datetime(str(d)))
takes approximately 70 seconds
and
pd.to_datetime(df['Time'])
takes approximately 0.01 seconds
The actual problem is that timezone information is being included. To remove it:
t = pd.to_datetime(df['Time'])
t = t.tz_localize(None)
This should be much faster!