问题
I'm trying to insert and update some data on MySql using PySpark SQL DataFrames and JDBC connection.
I've succeeded to insert new data using the SaveMode.Append. Is there a way to update the existing data and insert new data in MySql Table from PySpark SQL?
My code to insert is:
myDataFrame.write.mode(SaveMode.Append).jdbc(JDBCurl,mySqlTable,connectionProperties)
If I change to SaveMode.Overwrite it deletes the full table and creates a new one, I'm looking for something like the "ON DUPLICATE KEY UPDATE" available in MySql
Any help on this is highly appreciated.
回答1:
- Create a view in
Mysql
ascreate view <viewName> as select ...<tableName>
- Create trigger in mysql to update after insert using -
CREATE TRIGGER trigger_name
AFTER INSERT
ON <viewName> FOR EACH ROW
BEGIN
-- statements
-- INSERT ... ON DUPLICATE KEY UPDATE Statement
END$$
ref - https://www.mysqltutorial.org/mysql-triggers/mysql-after-insert-trigger/
- Write data to view
<viewName>
from spark
来源:https://stackoverflow.com/questions/62695035/insert-update-mysql-table-using-pyspark-dataframes-and-jdbc