Hive service, HiveServer2 & MetaStore service?

丶灬走出姿态 提交于 2019-12-04 08:42:26

问题


I am trying to understand hive in terms of architecture, and I am referring to Tom White's book on Hadoop.

I came across the following terms in regards to hive: Hive Services , hiveserver2 , metastore among others.

Referring to below diagrams from the Book (Hadoop: The definitive Guide).

Hive Architecture:

MetaStore configuration:

Hive Architecture which shows what "Driver" is:

I am not able to understand the following:

1) What is Hive Services in Hive architecture diagram? Is it same when we say hiveserver2?

2) What is Driver in Hive architecture diagram?

3) What is MetaStore (I am NOT referring to Metastore Database). Is it some process which runs? If so, is this part of hiveserver2 ? As per the diagram MetaStore can be remote, so if this is a JVM process, to which component it belongs to?

4) It say Hive service JVM , MetaStore JVM Server. But, where do these components gets installed? Are they part of the "server" side of "hive"?

5) In "Hive Architecture" diagram, it say "Hive Server"? What is this? Is this the one which we say "Hive Server 1" , "Hive Server2".

Can anyone help understand this?


回答1:


Hive Services

  • HiveServer2
  • Hive Metastore
  • HCatalog + WebHcat
  • Beeline & Hive CLI
  • Thrift client
  • FileSystem :: HDFS and other compatible filesystems like S3
  • Execution engine :: MapReduce, Tez, Spark
  • Hive Web UI (added in Hive 2.x). Maybe also Tez or Spark UI, but not really

Driver

The JDBC/ODBC or Thrift interfaces have drivers.
There are also the processes that interpret the query and compile it down to the execution engine code. I personally call that an interpreter or compiler, not a driver

Metastore Server

Not part of HiveServer2. It is literally a process running on top of an RDBMS (yes, you still need these when running Hive & Hadoop).

Supported Remote Metastore servers = Oracle, MySQL, Postgres
Embedded Metastore (not recommended for production) = Derby

See Hive Wiki

Metastore JVM

The orange boxes are showing you can deploy these services as part of the same JVM as the driver (interpreter) or as a remote server. The wiki describes these setups.

I believe this is a side-car process that maps the HiveServer2 queries to the MetaStore queries. For example, how do you translate the HiveQL into a process that reads metadata from MySQL or Postgres?

It can run on the server-side, yes, but this is not a recommended setup for fault tolerance and performance reasons.

HiveServer1 is deprecated. Feel free to read about it, but don't use it.




回答2:


My understanding is:

Hive Services includes: HS2(may call thrift server sometimes)、Driver, Compiler, Execution Engine. But these four component(HS2、Driver, Compiler, Execution Engine) are all in hiverserver2 process. So in hive, there are three processes:

  • HS2(includes hs2 or thrift server, Compiler, Execution Engine)
  • MetaStore
  • WebHCat


来源:https://stackoverflow.com/questions/49799838/hive-service-hiveserver2-metastore-service

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!