setting SparkContext for pyspark

后端 未结 3 1743
悲&欢浪女
悲&欢浪女 2021-02-05 11:17

I am newbie with spark and pyspark. I will appreciate if somebody explain what exactly does SparkContext parameter do? And how could I set

3条回答
  •  [愿得一人]
    2021-02-05 11:49

    The first thing a Spark program must do is to create a SparkContext object, which tells Spark how to access a cluster. To create a SparkContext you first need to build a SparkConf object that contains information about your application.

    If you are running pyspark i.e. shell then Spark automatically creates the SparkContext object for you with the name sc. But if you are writing your python program you have to do something like

    from pyspark import SparkContext
    sc = SparkContext(appName = "test")
    

    Any configuration would go into this spark context object like setting the executer memory or the number of core.

    These parameters can also be passed from the shell while invoking for example

    ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn-cluster \
    --num-executors 3 \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1
    lib/spark-examples*.jar \
    10
    

    For passing parameters to pyspark use something like this

    ./bin/pyspark --num-executors 17 --executor-cores 5 --executor-memory 8G
    

提交回复
热议问题