问题
Hi I have a scenario where the incoming message is a Json which has a header say tablename and the data part has the table column data. Now i want to write this to parquet to separate folders say /emp
and /dept
. I can achieve this in regular streaming by aggregating rows based on the tablname. But in structured streaming I am unable to split this. How can I achieve this in structured streaming.
{"tableName":"employee","data":{"empid":1","empname":"john","dept":"CS"} {"tableName":"employee","data":{"empid":2","empname":"james","dept":"CS"} {"tableName":"dept","data":{"dept":"1","deptname":"CS","desc":"COMPUTER SCIENCE DEPT"}
回答1:
i got this working by looping through the list of expected tables and for each of then filter the records from the dataframe and apply the schema & encoder specific to the table and then write to sink . So the read happens only once and for each table writeStream will be called and its working fine. Thanks for all the help
This takes care of dynamic partitioning of the parquet output folder based on the tables as well.
来源:https://stackoverflow.com/questions/51826841/structured-streaming-different-schema-in-nested-json