How do I install pyspark for use in standalone scripts?

后端未结

关注

 5  473

I\'m am trying to use Spark with Python. I installed the Spark 1.0.2 for Hadoop 2 binary distribution from the downloads page. I can run through the quickstart examples in P

相关标签:

5条回答

梦谈多话

2020-11-30 07:44

You can set the PYTHONPATH manually as you suggest, and this may be useful to you when testing stand-alone non-interactive scripts on a local installation.

However, (py)spark is all about distributing your jobs to nodes on clusters. Each cluster has a configuration defining a manager and many parameters; the details of setting this up are here, and include a simple local cluster (this may be useful for testing functionality).

In production, you will be submitting tasks to spark via spark-submit, which will distribute your code to the cluster nodes, and establish the context for them to run within on those nodes. You do, however, need to make sure that the python installations on the nodes have all the required dependencies (the recommended way) or that the dependencies are passed along with your code (I don't know how that works).

0 讨论(0)
发布评论:

提交评论
- 加载中...
长发绾君心

2020-11-30 07:45

Don't export $SPARK_HOME, do export SPARK_HOME.

0 讨论(0)
发布评论:

提交评论
- 加载中...
深忆病人

2020-11-30 07:50
As of Spark 2.2, PySpark is now available in PyPI. Thanks @Evan_Zamir.

pip install pyspark

As of Spark 2.1, you just need to download Spark and run setup.py:
```
cd my-spark-2.1-directory/python/
python setup.py install  # or pip install -e .
```
There is also a ticket for adding it to PyPI.
0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2020-11-30 07:56
I install pyspark for use in standalone following a guide. The steps are:
```
export SPARK_HOME="/opt/spark"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
```
Then you need install py4j:
```
pip install py4j
```
To try it:
```
./bin/spark-submit --master local[8] <python_file.py>
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
天涯浪人

2020-11-30 08:01
Spark-2.2.0 onwards use pip install pyspark to install pyspark in your machine.

For older versions refer following steps. Add Pyspark lib in Python path in the bashrc
```
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
```
also don't forget to set up the SPARK_HOME. PySpark depends the py4j Python package. So install that as follows
```
pip install py4j
```
For more details about stand alone PySpark application refer this post
0 讨论(0)
发布评论:

提交评论
- 加载中...

How do I install pyspark for use in standalone scripts?

Spark-2.2.0 onwards use pip install pyspark to install pyspark in your machine.

Spark-2.2.0 onwards use `pip install pyspark` to install pyspark in your machine.