How to build Spark from the sources from the Download Spark page?

让人想犯罪 __ 提交于 2019-12-10 13:37:34


I tried to install and build Spark 2.0.0 on Ubuntu VM with Ubuntu 16.04 as follows:

  1. Install Java

    sudo apt-add-repository ppa:webupd8team/java
    sudo apt-get update       
    sudo apt-get install oracle-java8-installer
  2. Install Scala

    Go to their Downloads tab on their site:

    I used Scala 2.11.8.

    sudo mkdir /usr/local/src/scala
    sudo tar -xvf scala-2.11.8.tgz -C /usr/local/src/scala/

    Modify the .bashrc file and include the path for scala:

    export SCALA_HOME=/usr/local/src/scala/scala-2.11.8
    export PATH=$SCALA_HOME/bin:$PATH

    then type:

    . .bashrc
  3. Install git

    sudo apt-get install git
  4. Download and build spark

    Go to:

    Download Spark 2.0.0 (Build from Source - for standalone mode).

    tar -xvf spark-2.0.0.tgz
    cd into the Spark folder (that has been extracted).

    now type:

    ./build/sbt assembly

    After its done Installing, I get the message:

    [success] Total time: 1940 s, completed...

    followed by date and time...

  5. Run Spark shell


That's when all hell breaks loose and I start getting the error. I go into the assembly folder to look for a folder called target. But there's no such folder there. The only things visible in assembly are: pom.xml, README, and src.

I looked it up online for quite a while and I haven't been able to find a single concrete solution that would help solve the error. Can someone please provide explicit step-by-step instructions as to how to go about solving this ?!? It's driving me nuts now... (T.T)

Screenshot of the error:


For some reason, Scala 2.11.8 is not working well while building but if I switch over to Scala 2.10.6 then it builds properly. I guess the reason I would need Scala in the first place is to get access to sbt to be able to build spark. Once its built, I need to direct myself to the spark folder and type:

build/sbt package

This will build the missing JAR files for me using Scala 2.11... kinda weird but that's how its working (I am assuming by looking at the logs).

Once spark builds again, type: bin/spark-shell (while being in the spark folder) and you'll have access to the spark shell.


type sbt package in spark directory not in build directory.


If your goal is really to build your custom Spark package from the sources you've downloaded from, you should do the following instead:

./build/mvn -Phadoop-2.7,yarn,mesos,hive,hive-thriftserver -DskipTests clean install

You may want to read the official document Building Spark.

NB You don't have to install Scala and git packages to build Spark so you could have skipped "2. Install Scala" and "3. Install git" steps.

