ValueError: Cannot cast DatetimeIndex to dtype datetime64[us]

前端 未结 3 1923
情话喂你
情话喂你 2021-02-20 06:53

I\'m trying to create a PostgreSQL table of 30-minute data for the S&P 500 ETF (spy30new, for testing freshly inserted data) from a table of several stocks with 15-minute da

相关标签:
3条回答
  • 2021-02-20 07:14

    Actually, this was my data frame.

                                  Biomass  Fossil Brown coal/Lignite  Fossil Coal-derived gas  Fossil Gas  Fossil Hard coal  Fossil Oil  Geothermal  Hydro Pumped Storage  Hydro Run-of-river and poundage  Hydro Water Reservoir  Nuclear   Other  Other renewable    Solar  Waste  Wind Offshore  Wind Onshore
    2018-02-02 00:00:00+01:00   4835.0                    16275.0                    446.0      1013.0            4071.0       155.0         5.0                   7.0                           1906.0                   35.0   8924.0  3643.0            142.0      0.0  595.0         2517.0       19999.0
    2018-02-02 00:15:00+01:00   4834.0                    16272.0                    446.0      1010.0            3983.0       155.0         5.0                   7.0                           1908.0                   71.0   8996.0  3878.0            142.0      0.0  594.0         2364.0       19854.0
    2018-02-02 00:30:00+01:00   4828.0                    16393.0                    446.0      1019.0            4015.0       155.0         5.0    
    

    I was trying to insert into SQL database but getting the same error as in the above question. What i have done is, convert the index of the data frame to the column with a label 'index'.

    df.reset_index(level=0, inplace=True)  
    

    Rename the column name 'index' to 'DateTime' by using this code.

    df = df.rename(columns={'index': 'DateTime'})
    

    Change the datatype to the 'datetime64'.

    df['DateTime'] = df['DateTime'].astype('datetime64')
    

    Store it in the sql database using these code.

    engine = create_engine('mysql+mysqlconnector://root:Password@localhost/generation_data', echo=True)
    df.to_sql(con=engine, name='test', if_exists='replace')
    
    0 讨论(0)
  • 2021-02-20 07:26

    Using pd.to_datetime() on each element worked. Option 4, which doesn't work, applies pd.to_datetime() to the entire series. Perhaps the Postgres driver understands python datetime, but not datetime64 in pandas & numpy. Option 4 produced the correct output, but I got ValueError (see title) when sending the DF to Postgres

    timesAsPyDt = (spy0030Df['dt']).apply(lambda d: pd.to_datetime(str(d)))
    
    0 讨论(0)
  • 2021-02-20 07:31

    I had the same problem and applying pd.to_datetime() on each element worked as well. But it is orders of magnitude slower than running pd.to_datetime() on the entire series. For a dataframe with over a 1 million rows:

    (df['Time']).apply(lambda d: pd.to_datetime(str(d)))
    

    takes approximately 70 seconds

    and

    pd.to_datetime(df['Time'])
    

    takes approximately 0.01 seconds

    The actual problem is that timezone information is being included. To remove it:

    t = pd.to_datetime(df['Time'])
    t = t.tz_localize(None)
    

    This should be much faster!

    0 讨论(0)
提交回复
热议问题