Hive table re-create before load every date

前提是你 提交于 2021-01-28 14:30:56

问题


I saw application are droping external table and creating again then loading the data and runnning msck command every time data load..what is the benefit of this on every time dropping and creating?


回答1:


There is no benefit in dropping and recreating EXTERNAL table, because dropping table leaves data intact.

Though there may be a benefit in dropping and re-creating MANAGED table because it will drop data as well.

One possible scenario if you are running on S3:

Dropping files early before the load completes, not at the time of loading may reduce the possibility of eventual consistency issue in S3 after the load.

First of all, when the files dropped, you may hit EC issue (immediately after dropping and during some time) when reading table. Early drop of files will speed-up the S3 synchronizing.

Second, the eventual issue if you writing files with the same name (rewriting). Early dropping may help, though better to use guid-prefixed(unique) filenames or timestamp in partition folder path or some other similar technics for solving this kind (eventual consistency after rewriting).



来源:https://stackoverflow.com/questions/58703263/hive-table-re-create-before-load-every-date

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!