Hive sort operation on high volume skewed dataset
问题 I am working on a big dataset of size around 3 TB on Hortonworks 2.6.5, the layout of the dataset is pretty straight forward. The heirarchy of data is as follows - -Country -Warehouse -Product -Product Type -Product Serial Id We have transaction data in the above hierarchy for 30 countries each country have more than 200 warehouse, single country USA contributes around 75% of the entire data set. Problem: 1) We have transaction data with transaction date column ( trans_dt ) for the above data