Hive external table optimal partition size

后端 未结 3 566
迷失自我
迷失自我 2021-01-15 02:32

What is the optimal size for external table partition? I am planning to partition table by year/month/day and we are getting about 2GB of data daily.

3条回答
  •  有刺的猬
    2021-01-15 02:58

    Optimal table partitioning is such that matching to your table usage scenario. Partitioning should be chosen based on:

    1. how the data is being queried (if you need to work mostly with daily data then partition by date).
    2. how the data is being loaded (parallel threads should load their own partitions, not overlapped)

    2Gb is not too much even for one file, though it again depends on your usage scenario. Avoid unnecessary complex and redundant partitions like (year, month, date) - in this case date is enough for partition pruning.

提交回复
热议问题