structured streaming different schema in nested json

风格不统一 提交于 2019-12-25 03:19:13

问题


Hi I have a scenario where the incoming message is a Json which has a header say tablename and the data part has the table column data. Now i want to write this to parquet to separate folders say /emp and /dept. I can achieve this in regular streaming by aggregating rows based on the tablname. But in structured streaming I am unable to split this. How can I achieve this in structured streaming.

{"tableName":"employee","data":{"empid":1","empname":"john","dept":"CS"} {"tableName":"employee","data":{"empid":2","empname":"james","dept":"CS"} {"tableName":"dept","data":{"dept":"1","deptname":"CS","desc":"COMPUTER SCIENCE DEPT"}


回答1:


i got this working by looping through the list of expected tables and for each of then filter the records from the dataframe and apply the schema & encoder specific to the table and then write to sink . So the read happens only once and for each table writeStream will be called and its working fine. Thanks for all the help

This takes care of dynamic partitioning of the parquet output folder based on the tables as well.



来源:https://stackoverflow.com/questions/51826841/structured-streaming-different-schema-in-nested-json

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!