Is Hive faster than Spark?

喜你入骨 提交于 2019-12-04 09:11:39

Hive is just a framework that gives sql functionality to MapReduce type workloads.

These workloads can run on mapreduce or yarn.

So comparing Hive on tez vs Hive on spark. Nice article below discussing this When to go with ETL on Hive using Tez VS When to go with Spark ETL? (Gist use Hive on spark if not sure).

Lower the better

javadba

Spark is convenient but does not handle scale all that well as regards SQL performance.

Hive has amazing support for co-partitioned joins. When the tables you were joining have hundreds of millions to billions of rows you will really appreciate the fine grained join support via:

  • similar distribute by and sort by (or cluster by)
  • bucketed joins

Hive has extensive support for metadata-only queries: Spark has only had a glimmer of it since 2.1

Spark runs out of steam quickly when the number of partitions exceeds maybe 10K+. Hive does not suffer from this limitation.

Fast forward to 2018, Hive is much faster (and more stable) than SparkSQL, especially in concurrent environments, according to the following article:

https://mr3.postech.ac.kr/blog/2018/10/31/performance-evaluation-0.4/

The article compares several SQL-on-Hadoop systems using the TPC-DS benchmark (1TB, 3TB, 10TB) using three clusters (11 nodes, 21 nodes, 42 nodes):

  • Hive-LLAP included in HDP(Hortonworks Data Platform) 2.6.4
  • Hive-LLAP included in HDP 3.0.1
  • Presto 0.203e (with cost-based optimization enabled)
  • Presto 0.208e (with cost-based optimization enabled)
  • SparkSQL 2.2.0 included in HDP 2.6.4
  • SparkSQL 2.3.1 included in HDP 3.0.1
  • Hive 3.1.0 running on top of Tez
  • Hive on Tez included in HDP 3.0.1
  • Hive 3.1.0 running on top of MR3 0.4
  • Hive 2.3.3 running on top of MR3 0.4

So, in comparison with Hive-based systems and Presto, SparkSQL is very slow and does not scale in concurrent environments. (Note that the experiment uses SparkSQL running on vanilla Spark.)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!