INSERT & UPDATE MySql table using PySpark DataFrames and JDBC

问题

I'm trying to insert and update some data on MySql using PySpark SQL DataFrames and JDBC connection.

I've succeeded to insert new data using the SaveMode.Append. Is there a way to update the existing data and insert new data in MySql Table from PySpark SQL?

My code to insert is:

myDataFrame.write.mode(SaveMode.Append).jdbc(JDBCurl,mySqlTable,connectionProperties)

If I change to SaveMode.Overwrite it deletes the full table and creates a new one, I'm looking for something like the "ON DUPLICATE KEY UPDATE" available in MySql

Any help on this is highly appreciated.

回答1:

Create a view in Mysql as create view <viewName> as select ...<tableName>
Create trigger in mysql to update after insert using -

CREATE TRIGGER trigger_name
    AFTER INSERT
    ON <viewName> FOR EACH ROW
BEGIN
    -- statements
    -- INSERT ... ON DUPLICATE KEY UPDATE Statement
END$$

ref - https://www.mysqltutorial.org/mysql-triggers/mysql-after-insert-trigger/