How to run multiple instances of Spark 2.0 at once (in multiple Jupyter Notebooks)?

房东的猫 提交于 2019-12-01 10:56:53

By default Spark runs on top of Hive and Hadoop, and stores its instructions for database transformations in Derby - a light weight database system. Derby can only run one Spark instance at a time, so when you start a second notebook and start running Spark commands, it crashes.

To get around this you can connect Spark's Hive installation to Postgres instead of Derby.

Brew install postgres, if you do not have it installed already.

Then download postgresql-9.4.1212.jar (assuming you are running java 1.8 aka java8) from https://jdbc.postgresql.org/download.html

Move this .jar file to the /libexec/jars/ directory for your Spark installation.

ex: /usr/local/Cellar/apache-spark/2.0.1/

(on Mac you can find where Spark is installed by typing brew info apache-spark in the command line)

Next create hive-site.xml in the /libexec/conf directory for your Spark installation.

ex: /usr/local/Cellar/apache-spark/2.0.1/libexec/conf

This can be done through a text editor - just save the file with a '.xml' extension.

hive-site.xml should contain the following text:

<configuration>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:postgresql://localhost:5432/hive_metastore</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.postgresql.Driver</value>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>mypassword</value>
</property>

</configuration>

'hive' and 'mypassword' can be replaced with whatever makes sense to you - but must match with the next step.

Finally create a user and password in Postgress: in the command line run the following commands -

psql
CREATE USER hive;
ALTER ROLE hive WITH PASSWORD 'mypassword';
CREATE DATABASE hive_metastore;
GRANT ALL PRIVILEGES ON DATABASE hive_metastore TO hive;
\q

Thats it, you're done. Spark should now run in multiple Jupyter Notebooks simultaneously.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!