How to build Spark from the sources from the Download Spark page?

让人想犯罪 __ 提交于 2019-12-10 13:37:34

问题


I tried to install and build Spark 2.0.0 on Ubuntu VM with Ubuntu 16.04 as follows:

  1. Install Java

    sudo apt-add-repository ppa:webupd8team/java
    sudo apt-get update       
    sudo apt-get install oracle-java8-installer
    
  2. Install Scala

    Go to their Downloads tab on their site: scala-lang.org/download/all.html

    I used Scala 2.11.8.

    sudo mkdir /usr/local/src/scala
    sudo tar -xvf scala-2.11.8.tgz -C /usr/local/src/scala/
    

    Modify the .bashrc file and include the path for scala:

    export SCALA_HOME=/usr/local/src/scala/scala-2.11.8
    export PATH=$SCALA_HOME/bin:$PATH
    

    then type:

    . .bashrc
    
  3. Install git

    sudo apt-get install git
    
  4. Download and build spark

    Go to: http://spark.apache.org/downloads.html

    Download Spark 2.0.0 (Build from Source - for standalone mode).

    tar -xvf spark-2.0.0.tgz
    cd into the Spark folder (that has been extracted).
    

    now type:

    ./build/sbt assembly
    

    After its done Installing, I get the message:

    [success] Total time: 1940 s, completed...

    followed by date and time...

  5. Run Spark shell

    bin/spark-shell
    

That's when all hell breaks loose and I start getting the error. I go into the assembly folder to look for a folder called target. But there's no such folder there. The only things visible in assembly are: pom.xml, README, and src.

I looked it up online for quite a while and I haven't been able to find a single concrete solution that would help solve the error. Can someone please provide explicit step-by-step instructions as to how to go about solving this ?!? It's driving me nuts now... (T.T)

Screenshot of the error:


回答1:


For some reason, Scala 2.11.8 is not working well while building but if I switch over to Scala 2.10.6 then it builds properly. I guess the reason I would need Scala in the first place is to get access to sbt to be able to build spark. Once its built, I need to direct myself to the spark folder and type:

build/sbt package

This will build the missing JAR files for me using Scala 2.11... kinda weird but that's how its working (I am assuming by looking at the logs).

Once spark builds again, type: bin/spark-shell (while being in the spark folder) and you'll have access to the spark shell.




回答2:


type sbt package in spark directory not in build directory.




回答3:


If your goal is really to build your custom Spark package from the sources you've downloaded from http://spark.apache.org/downloads.html, you should do the following instead:

./build/mvn -Phadoop-2.7,yarn,mesos,hive,hive-thriftserver -DskipTests clean install

You may want to read the official document Building Spark.

NB You don't have to install Scala and git packages to build Spark so you could have skipped "2. Install Scala" and "3. Install git" steps.



来源:https://stackoverflow.com/questions/39282434/how-to-build-spark-from-the-sources-from-the-download-spark-page

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!