pandas-to-sql

Connecting to Teradata using Python

血红的双手。 提交于 2019-12-10 00:14:49
问题 I am trying to connect to teradata server and load a dataframe into a table using python. Here is my code - import sqlalchemy engine = sqlalchemy.create_engine("teradata://username:passwor@hostname:port/") f3.to_sql(con=engine, name='sample', if_exists='replace', schema = 'schema_name') But I am getting the following error - InterfaceError: (teradata.api.InterfaceError) ('DRIVER_NOT_FOUND', "No driver found for 'Teradata'. Available drivers: SQL Server,SQL Server Native Client 11.0,ODBC

Pandas to_sql() to update unique values in DB?

我是研究僧i 提交于 2019-12-07 16:45:49
问题 How can I use the df.to_sql(if_exists = 'append') to append ONLY the unique values between the dataframe and the database. In other words, I would like to evaluate the duplicates between the DF and the DB and drop those duplicates before writing to the database. Is there a parameter for this? I understand that the parameters if_exists = 'append' and if_exists = 'replace' is for the entire table - not the unique entries. I am using: sqlalchemy pandas dataframe with the following datatypes:

Create sql table from dask dataframe using map_partitions and pd.df.to_sql

冷暖自知 提交于 2019-12-07 01:07:34
问题 Dask doesn't have a df.to_sql() like pandas and so I am trying to replicate the functionality and create an sql table using the map_partitions method to do so. Here is my code: import dask.dataframe as dd import pandas as pd import sqlalchemy_utils as sqla_utils db_url = 'my_db_url_connection' conn = sqla.create_engine(db_url) ddf = dd.read_csv('data/prod.csv') meta=dict(ddf.dtypes) ddf.map_partitions(lambda df: df.to_sql('table_name', db_url, if_exists='append',index=True), ddf, meta=meta)

duplicate key value violates unique constraint - postgres error when trying to create sql table from dask dataframe

爱⌒轻易说出口 提交于 2019-12-01 10:12:16
Following on from this question, when I try to create a postgresql table from a dask.dataframe with more than one partition I get the following error: IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "pg_type_typname_nsp_index" DETAIL: Key (typname, typnamespace)=(test1, 2200) already exists. [SQL: '\nCREATE TABLE test1 (\n\t"A" BIGINT, \n\t"B" BIGINT, \n\t"C" BIGINT, \n\t"D" BIGINT, \n\t"E" BIGINT, \n\t"F" BIGINT, \n\t"G" BIGINT, \n\t"H" BIGINT, \n\t"I" BIGINT, \n\t"J" BIGINT, \n\tidx BIGINT\n)\n\n'] You can recreate the error with the following code:

duplicate key value violates unique constraint - postgres error when trying to create sql table from dask dataframe

不羁岁月 提交于 2019-12-01 07:09:15
问题 Following on from this question, when I try to create a postgresql table from a dask.dataframe with more than one partition I get the following error: IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "pg_type_typname_nsp_index" DETAIL: Key (typname, typnamespace)=(test1, 2200) already exists. [SQL: '\nCREATE TABLE test1 (\n\t"A" BIGINT, \n\t"B" BIGINT, \n\t"C" BIGINT, \n\t"D" BIGINT, \n\t"E" BIGINT, \n\t"F" BIGINT, \n\t"G" BIGINT, \n\t"H" BIGINT, \n\t"I

Connecting to Teradata using Python

别说谁变了你拦得住时间么 提交于 2019-12-01 02:34:17
I am trying to connect to teradata server and load a dataframe into a table using python. Here is my code - import sqlalchemy engine = sqlalchemy.create_engine("teradata://username:passwor@hostname:port/") f3.to_sql(con=engine, name='sample', if_exists='replace', schema = 'schema_name') But I am getting the following error - InterfaceError: (teradata.api.InterfaceError) ('DRIVER_NOT_FOUND', "No driver found for 'Teradata'. Available drivers: SQL Server,SQL Server Native Client 11.0,ODBC Driver 13 for SQL Server") Can anybody help me to figure out whats wrong in my approach? There's is

Speeding up pandas.DataFrame.to_sql with fast_executemany of pyODBC

谁都会走 提交于 2019-11-26 15:50:34
I would like to send a large pandas.DataFrame to a remote server running MS SQL. The way I do it now is by converting a data_frame object to a list of tuples and then send it away with pyODBC's executemany() function. It goes something like this: import pyodbc as pdb list_of_tuples = convert_df(data_frame) connection = pdb.connect(cnxn_str) cursor = connection.cursor() cursor.fast_executemany = True cursor.executemany(sql_statement, list_of_tuples) connection.commit() cursor.close() connection.close() I then started to wonder if things can be sped up (or at least more readable) by using data