I am newbie with spark
and pyspark
. I will appreciate if somebody explain what exactly does SparkContext
parameter do? And how could I set
The SparkContext object is the driver program. This object co-ordinates the processes over the cluster that you will be running your application on.
When you run PySpark shell a default SparkContext object is automatically created with variable sc.
If you create a standalone application you will need to initialize the SparkContext object in your script like below:
sc = SparkContext("local", "My App")
Where the first parameter is the URL to the cluster and the second parameter is the name of your app.
I have written an article that goes through the basics of PySpark and Apache which you may find useful: https://programmathics.com/big-data/apache-spark/apache-installation-and-building-stand-alone-applications/
DISCLAIMER: I am the creator of that website.