Delete azure sql database rows from azure databricks

梦想与她 提交于 2019-12-23 04:54:09


I have a table in Azure SQL database from which I want to either delete selected rows based on some criteria or entire table from Azure Databricks. Currently I am using the truncate property of JDBC to truncate the entire table without dropping it and then re-write it with new dataframe.

df.write \
     .option('user', jdbcUsername) \
     .option('password', jdbcPassword) \
     .jdbc('<connection_string>', '<table_name>', mode = 'overwrite', properties = {'truncate' : 'true'} )

But going forward I don't want to truncate and overwrite the entire table every time but rather use delete command. I was not able to achieve this using pushdown query either. Any help on this would be greatly appreciated.


You can also drop down to scala to do this, as the SQL Server JDBC driver is already installed. EG:


import java.util.Properties
import java.sql.DriverManager

val jdbcUsername = "xxxxx"
val jdbcPassword = "xxxxxx"
val driverClass = ""

// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://;database=AdventureWorks;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*;loginTimeout=30;"

// Create a Properties() object to hold the parameters.

val connectionProperties = new Properties()

connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
connectionProperties.setProperty("Driver", driverClass)

val connection = DriverManager.getConnection(jdbcUrl, jdbcUsername, jdbcPassword)
val stmt = connection.createStatement()
val sql = "delete from sometable where someColumn > 4"



Use pyodbc to execute a SQL Statement.

import pyodbc
conn = pyodbc.connect( 'DRIVER={ODBC Driver 17 for SQL Server};'
conn.execute('DELETE TableBlah WHERE 1=2')

It's a bit of a pain to get pyodbc working on Databricks - see details here:

