While inserting from Hive table to HIve table, It is loading more records that actual records. Can anyone help in this weird behaviour of Hive ?
My query would be lo
If hive.compute.query.using.stats=true;
then optimizer is using statistics for query calculation instead of querying table data. This is much faster because metastore is a fast database like MySQL and does not require map-reduce. But statistics can be not fresh (stale) if the table was loaded not using INSERT OVERWRITE or configuration parameter hive.stats.autogather
responsible for statistics auto gathering was set to false. Also statistics will be not fresh after loading files or after using third-party tools. It's because files was never analyzed, statistics in metastore is not fresh, if you have put new files, nobody knows about how the data was changed. Also after sqoop loading, etc. So, it's a good practice to gather statistics for table or partition after loading using 'ANALYZE TABLE ... COMPUTE STATISTICS'.
In case it's impossible to gather statistics automatically (works for INSERT OVERWRITE) or by running ANALYZE
statement then better to switch off hive.compute.query.using.stats
parameter. Hive will query data instead of using statistics.
See this for reference: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-StatisticsinHive