Best way ho to validate ingested data

会有一股神秘感。 提交于 2019-12-13 05:19:19

问题


I am ingesting data daily from various external sources like GA, scrapers, Google BQ, etc. I store created CSV file into HDFS, create stage table from it and then append it to historical table in Hadoop. Can you share some best practices how to valide new data with historical one? Like for example compare row count of actual data with average of last 10 days or someting like that. Is there any prepared solution in spark or something?

Thanks for advices.

来源:https://stackoverflow.com/questions/52895881/best-way-ho-to-validate-ingested-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!