I am trying to understand hive
in terms of architecture, and I am referring to Tom White\'s book on Hadoop.
I came across the following terms in regards to
The JDBC/ODBC or Thrift interfaces have drivers.
There are also the processes that interpret the query and compile it down to the execution engine code. I personally call that an interpreter or compiler, not a driver
Not part of HiveServer2. It is literally a process running on top of an RDBMS (yes, you still need these when running Hive & Hadoop).
Supported Remote Metastore servers = Oracle, MySQL, Postgres
Embedded Metastore (not recommended for production) = Derby
See Hive Wiki
Metastore JVM
The orange boxes are showing you can deploy these services as part of the same JVM as the driver (interpreter) or as a remote server. The wiki describes these setups.
I believe this is a side-car process that maps the HiveServer2 queries to the MetaStore queries. For example, how do you translate the HiveQL into a process that reads metadata from MySQL or Postgres?
It can run on the server-side, yes, but this is not a recommended setup for fault tolerance and performance reasons.
HiveServer1 is deprecated. Feel free to read about it, but don't use it.
My understanding is:
Hive Services includes: HS2(may call thrift server sometimes)、Driver, Compiler, Execution Engine. But these four component(HS2、Driver, Compiler, Execution Engine) are all in hiverserver2 process. So in hive, there are three processes: