Loading more records than actual in HIve

前端 未结 1 1678
生来不讨喜
生来不讨喜 2021-01-14 15:37

While inserting from Hive table to HIve table, It is loading more records that actual records. Can anyone help in this weird behaviour of Hive ?

My query would be lo

相关标签:
1条回答
  • 2021-01-14 16:07

    If hive.compute.query.using.stats=true; then optimizer is using statistics for query calculation instead of querying table data. This is much faster because metastore is a fast database like MySQL and does not require map-reduce. But statistics can be not fresh (stale) if the table was loaded not using INSERT OVERWRITE or configuration parameter hive.stats.autogather responsible for statistics auto gathering was set to false. Also statistics will be not fresh after loading files or after using third-party tools. It's because files was never analyzed, statistics in metastore is not fresh, if you have put new files, nobody knows about how the data was changed. Also after sqoop loading, etc. So, it's a good practice to gather statistics for table or partition after loading using 'ANALYZE TABLE ... COMPUTE STATISTICS'.

    In case it's impossible to gather statistics automatically (works for INSERT OVERWRITE) or by running ANALYZE statement then better to switch off hive.compute.query.using.stats parameter. Hive will query data instead of using statistics.

    See this for reference: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-StatisticsinHive

    0 讨论(0)
提交回复
热议问题