org.apache.spark.SparkException: Job aborted due to stage failure: Task from application

后端 未结 2 634
攒了一身酷
攒了一身酷 2021-02-13 03:43

I have a problem with running spark application on standalone cluster. (I use spark 1.1.0 version). I succesfully run master server by command:

bash start-master         


        
相关标签:
2条回答
  • 2021-02-13 04:21

    Found a way to run it using IDE / Maven

    1. Create a Fat Jar ( One which includes all dependencies ). Use Shade Plugin for this. Example pom :
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.2</version>
        <configuration>
            <filters>
                <filter>
                    <artifact>*:*</artifact>
                    <excludes>
                        <exclude>META-INF/*.SF</exclude>
                        <exclude>META-INF/*.DSA</exclude>
                        <exclude>META-INF/*.RSA</exclude>
                    </excludes>
                </filter>
            </filters>
        </configuration>
        <executions>
            <execution>
                <id>job-driver-jar</id>
                <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>
                    <shadedArtifactAttached>true</shadedArtifactAttached>
                    <shadedClassifierName>driver</shadedClassifierName>
                    <transformers>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                        <!--
                        Some care is required:
                        http://doc.akka.io/docs/akka/snapshot/general/configuration.html
                        -->
                        <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                            <resource>reference.conf</resource>
                        </transformer>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                            <mainClass>mainClass</mainClass>
                        </transformer>
                    </transformers>
                </configuration>
            </execution>
            <execution>
                <id>worker-library-jar</id>
                <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>
                    <shadedArtifactAttached>true</shadedArtifactAttached>
                    <shadedClassifierName>worker</shadedClassifierName>
                    <transformers>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                    </transformers>
                </configuration>
            </execution>
        </executions>
    </plugin>
    
    1. Now we have to send the compiled jar file to the cluster. For this, specify the jar file in the spark config like this :

    SparkConf conf = new SparkConf().setAppName("appName").setMaster("spark://machineName:7077").setJars(new String[] {"target/appName-1.0-SNAPSHOT-driver.jar"});

    1. Run mvn clean package to create the Jar file. It will be created in your target folder.

    2. Run using your IDE or using maven command :

    mvn exec:java -Dexec.mainClass="className"

    This does not require spark-submit. Just remember to package file before running

    If you don't want to hardcode the jar path, you can do this :

    1. In the config, write :

    SparkConf conf = new SparkConf() .setAppName("appName") .setMaster("spark://machineName:7077") .setJars(JavaSparkContext.jarOfClass(this.getClass()));

    1. Create the fat jar ( as above ) and run using maven after running package command :

    java -jar target/application-1.0-SNAPSHOT-driver.jar

    This will take the jar from the jar the class was loaded.

    0 讨论(0)
  • 2021-02-13 04:37

    For the benefit of others running into this problem:

    I faced an identical issue due to a mismatch between the spark connector and spark version being used. Spark was 1.3.1 and the connector was 1.3.0, an identical error message appeared:

    org.apache.spark.SparkException: Job aborted due to stage failure:
      Task 2 in stage 0.0 failed 4 times, most recent failure: Lost 
      task 2.3 in stage 0.0
    

    Updating the dependancy in SBT solved the problem.

    0 讨论(0)
提交回复
热议问题