Python/ SQL : replacing the empty strings of a DataFrame by a “Null” value to insert the data in a database

家住魔仙堡 提交于 2021-01-20 08:48:02

问题


Let's say that I have this dataframe :

REFERENCE = ["GZF882348G", "SFGUZBJLNJU", "FTLNGZ242112", "DFBHGVGHG543"]
IBAN = ["FR7343563", "FR4832545", "FR9858331", "FR2001045"]
DEBIT = [26, '', 856, '']
CREDIT = ['', 324, '', 876]
MONTANT = [641, 33, '', 968]

df = pd.DataFrame({'Référence' : REFERENCE, 'IBAN' : IBAN, 'Débit' : DEBIT, 'Crédit' : CREDIT, 'Montant' : MONTANT})

I have a problem of format to insert this kind of data in my database. The columns "Débit", "Crédit", "Montant" are defined to get floats as data. However the data of these columns are not only integers, I have empty strings too and that is my issue. I know that I have to write a condition that replace a empty string by a "Null" value in the SQL format, however I do not know how to do that in python or in SQL. I am discovering/learning the SQL environment.

Here is my code :

import pandas as pd
import pyodbc 

server = '...'
database = '...'
username = '...' 
password = '...'
driver = '...'

connection = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+password)
cursor = connection.cursor()

for i, row in df.iterrows():


    sql_exe = "INSERT INTO dbo.tbl_data_xml (Réference,IBAN,Débit,Crédit,Montant) VALUES (?,?,?,?,?)"
    cursor.execute(sql_exe, tuple(row))
    
    connection.commit()

Anyone can help me please.

Thank you


回答1:


You appear to be mixing types in Pandas data frame where string, '', is combined with integer in the same column as evidenced by all object types. In relational databases you cannot mix data types. And converting '' to string 'NULL' will not resolve your issue. In SQL, NULL <> 'NULL'

df.dtypes

# Référence    object
# IBAN         object
# Débit        object
# Crédit       object
# Montant      object
# dtype: object

Therefore, convert columns to numeric with pd.to_numeric where empty string, '', converts to NaN which this entity should translate to SQL's NULL entity.

df[['Débit', 'Crédit', 'Montant']] = df[['Débit', 'Crédit', 'Montant']].apply(pd.to_numeric)

df.dtypes
# Référence     object
# IBAN          object
# Débit        float64
# Crédit       float64
# Montant      float64
# dtype: object

df
#       Référence       IBAN  Débit  Crédit  Montant
# 0    GZF882348G  FR7343563   26.0     NaN    641.0
# 1   SFGUZBJLNJU  FR4832545    NaN   324.0     33.0
# 2  FTLNGZ242112  FR9858331  856.0     NaN      NaN
# 3  DFBHGVGHG543  FR2001045    NaN   876.0    968.0

Then run your query. In fact, avoid the slower for loop with iterrows and consider df.to_numpy + cursor.executemany.

# PREPARED STATEMENT
sql_exe = "INSERT INTO dbo.tbl_data_xml (Réference,IBAN,Débit,Crédit,Montant) VALUES (?,?,?,?,?)"

# CONVERT DATA TO LIST OF NUMPY ARRAYS
sql_data = df.where(pd.notnull(df), None).to_numpy().replace(.tolist()

# EXECUTE ACTION QUERY
cursor.executemany(sql_exe, sql_data)
connection.commit()



回答2:


You can use Pandas.DataFrame.to_sql such as

df.to_sql('dbo.tbl_data_xml', con=connection, if_exists='append', index=False )

where append option stands for inserting new values to the table, if pandas version is 0.15+




回答3:


You could do:

df.loc[df['Débit'].eq(''), 'Débit'] = 'NULL'
df.loc[df['Crédit'].eq(''), 'Crédit'] = 'NULL'
df.loc[df['Montant'].eq(''), 'Montant'] = 'NULL'

print(df)

Output

      Référence       IBAN Débit Crédit Montant
0    GZF882348G  FR7343563    26   NULL     641
1   SFGUZBJLNJU  FR4832545  NULL    324      33
2  FTLNGZ242112  FR9858331   856   NULL    NULL
3  DFBHGVGHG543  FR2001045  NULL    876     968

Or simply,

df[df[['Débit', 'Crédit', 'Montant']].eq('')] = "NULL"
print(df)

Output

      Référence       IBAN Débit Crédit Montant
0    GZF882348G  FR7343563    26   NULL     641
1   SFGUZBJLNJU  FR4832545  NULL    324      33
2  FTLNGZ242112  FR9858331   856   NULL    NULL
3  DFBHGVGHG543  FR2001045  NULL    876     968



回答4:


Convert to numeric the respective columns and fillna(NULL)

df[['Débit', 'Crédit', 'Montant']]=df.iloc[:,2:].apply(lambda x: pd.to_numeric(x).fillna('NULL'))



     Référence       IBAN Débit Crédit Montant
0    GZF882348G  FR7343563    26   NULL     641
1   SFGUZBJLNJU  FR4832545  NULL    324      33
2  FTLNGZ242112  FR9858331   856   NULL    NULL
3  DFBHGVGHG543  FR2001045  NULL    876     968


来源:https://stackoverflow.com/questions/65065332/python-sql-replacing-the-empty-strings-of-a-dataframe-by-a-null-value-to-in

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!