as titled, how do I know which version of spark has been installed in the CentOS?
The current system has installed cdh5.1.0.
Open Spark shell Terminal, run sc.version
Which ever shell command you use either spark-shell or pyspark, it will land on a Spark Logo with a version name beside it.
$ pyspark
$ Python 2.6.6 (r266:84292, May 22 2015, 08:34:51)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-15)] on linux2
............
...........
Welcome to
version 1.3.0
You can use spark-submit command:
spark-submit --version
If you are using pyspark, the spark version being used can be seen beside the bold Spark logo as shown below:
manoj@hadoop-host:~$ pyspark
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.0
/_/
Using Python version 2.7.6 (default, Jun 22 2015 17:58:13)
SparkContext available as sc, HiveContext available as sqlContext.
>>>
If you want to get the spark version explicitly, you can use version method of SparkContext as shown below:
>>>
>>> sc.version
u'1.6.0'
>>>
Non-interactive way, that I am using for AWS EMR proper PySpark version installation:
# pip3 install pyspark==$(spark-submit --version 2>&1| grep -m 1 -Eo "([0-9]{1,}\.)+[0-9]{1,}")
Collecting pyspark==2.4.4
solution:
# spark-shell --version 2>&1| grep -m 1 -Eo "([0-9]{1,}\.)+[0-9]{1,}"
2.4.4
solution:
# spark-submit --version 2>&1| grep -m 1 -Eo "([0-9]{1,}\.)+[0-9]{1,}"
2.4.4
Most of the answers here requires initializing a sparksession. This answer provide a way to statically infer the version from library.
ammonites@ org.apache.spark.SPARK_VERSION
res4: String = "2.4.5"