pandas-to-sql | 易学教程

Connecting to Teradata using Python

阅读更多关于 Connecting to Teradata using Python

问题 I am trying to connect to teradata server and load a dataframe into a table using python. Here is my code - import sqlalchemy engine = sqlalchemy.create_engine("teradata://username:passwor@hostname:port/") f3.to_sql(con=engine, name='sample', if_exists='replace', schema = 'schema_name') But I am getting the following error - InterfaceError: (teradata.api.InterfaceError) ('DRIVER_NOT_FOUND', "No driver found for 'Teradata'. Available drivers: SQL Server,SQL Server Native Client 11.0,ODBC

Pandas to_sql() to update unique values in DB?

阅读更多关于 Pandas to_sql() to update unique values in DB?

问题 How can I use the df.to_sql(if_exists = 'append') to append ONLY the unique values between the dataframe and the database. In other words, I would like to evaluate the duplicates between the DF and the DB and drop those duplicates before writing to the database. Is there a parameter for this? I understand that the parameters if_exists = 'append' and if_exists = 'replace' is for the entire table - not the unique entries. I am using: sqlalchemy pandas dataframe with the following datatypes:

Create sql table from dask dataframe using map_partitions and pd.df.to_sql

阅读更多关于 Create sql table from dask dataframe using map_partitions and pd.df.to_sql

问题 Dask doesn't have a df.to_sql() like pandas and so I am trying to replicate the functionality and create an sql table using the map_partitions method to do so. Here is my code: import dask.dataframe as dd import pandas as pd import sqlalchemy_utils as sqla_utils db_url = 'my_db_url_connection' conn = sqla.create_engine(db_url) ddf = dd.read_csv('data/prod.csv') meta=dict(ddf.dtypes) ddf.map_partitions(lambda df: df.to_sql('table_name', db_url, if_exists='append',index=True), ddf, meta=meta)

duplicate key value violates unique constraint - postgres error when trying to create sql table from dask dataframe

阅读更多关于 duplicate key value violates unique constraint - postgres error when trying to create sql table from dask dataframe

Following on from this question, when I try to create a postgresql table from a dask.dataframe with more than one partition I get the following error: IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "pg_type_typname_nsp_index" DETAIL: Key (typname, typnamespace)=(test1, 2200) already exists. [SQL: '\nCREATE TABLE test1 (\n\t"A" BIGINT, \n\t"B" BIGINT, \n\t"C" BIGINT, \n\t"D" BIGINT, \n\t"E" BIGINT, \n\t"F" BIGINT, \n\t"G" BIGINT, \n\t"H" BIGINT, \n\t"I" BIGINT, \n\t"J" BIGINT, \n\tidx BIGINT\n)\n\n'] You can recreate the error with the following code:

duplicate key value violates unique constraint - postgres error when trying to create sql table from dask dataframe

阅读更多关于 duplicate key value violates unique constraint - postgres error when trying to create sql table from dask dataframe

问题 Following on from this question, when I try to create a postgresql table from a dask.dataframe with more than one partition I get the following error: IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "pg_type_typname_nsp_index" DETAIL: Key (typname, typnamespace)=(test1, 2200) already exists. [SQL: '\nCREATE TABLE test1 (\n\t"A" BIGINT, \n\t"B" BIGINT, \n\t"C" BIGINT, \n\t"D" BIGINT, \n\t"E" BIGINT, \n\t"F" BIGINT, \n\t"G" BIGINT, \n\t"H" BIGINT, \n\t"I

Connecting to Teradata using Python

阅读更多关于 Connecting to Teradata using Python

I am trying to connect to teradata server and load a dataframe into a table using python. Here is my code - import sqlalchemy engine = sqlalchemy.create_engine("teradata://username:passwor@hostname:port/") f3.to_sql(con=engine, name='sample', if_exists='replace', schema = 'schema_name') But I am getting the following error - InterfaceError: (teradata.api.InterfaceError) ('DRIVER_NOT_FOUND', "No driver found for 'Teradata'. Available drivers: SQL Server,SQL Server Native Client 11.0,ODBC Driver 13 for SQL Server") Can anybody help me to figure out whats wrong in my approach? There's is

Speeding up pandas.DataFrame.to_sql with fast_executemany of pyODBC

阅读更多关于 Speeding up pandas.DataFrame.to_sql with fast_executemany of pyODBC

I would like to send a large pandas.DataFrame to a remote server running MS SQL. The way I do it now is by converting a data_frame object to a list of tuples and then send it away with pyODBC's executemany() function. It goes something like this: import pyodbc as pdb list_of_tuples = convert_df(data_frame) connection = pdb.connect(cnxn_str) cursor = connection.cursor() cursor.fast_executemany = True cursor.executemany(sql_statement, list_of_tuples) connection.commit() cursor.close() connection.close() I then started to wonder if things can be sped up (or at least more readable) by using data