I have a problem with running spark application on standalone cluster. (I use spark 1.1.0 version). I succesfully run master server by command:
bash start-master
Found a way to run it using IDE / Maven
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.2</version> <configuration> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> </configuration> <executions> <execution> <id>job-driver-jar</id> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <shadedArtifactAttached>true</shadedArtifactAttached> <shadedClassifierName>driver</shadedClassifierName> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> <!-- Some care is required: http://doc.akka.io/docs/akka/snapshot/general/configuration.html --> <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>reference.conf</resource> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>mainClass</mainClass> </transformer> </transformers> </configuration> </execution> <execution> <id>worker-library-jar</id> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <shadedArtifactAttached>true</shadedArtifactAttached> <shadedClassifierName>worker</shadedClassifierName> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> </transformers> </configuration> </execution> </executions> </plugin>
SparkConf conf = new SparkConf().setAppName("appName").setMaster("spark://machineName:7077").setJars(new String[] {"target/appName-1.0-SNAPSHOT-driver.jar"});
Run mvn clean package to create the Jar file. It will be created in your target folder.
Run using your IDE or using maven command :
mvn exec:java -Dexec.mainClass="className"
This does not require spark-submit. Just remember to package file before running
If you don't want to hardcode the jar path, you can do this :
SparkConf conf = new SparkConf() .setAppName("appName") .setMaster("spark://machineName:7077") .setJars(JavaSparkContext.jarOfClass(this.getClass()));
java -jar target/application-1.0-SNAPSHOT-driver.jar
This will take the jar from the jar the class was loaded.
For the benefit of others running into this problem:
I faced an identical issue due to a mismatch between the spark connector and spark version being used. Spark was 1.3.1 and the connector was 1.3.0, an identical error message appeared:
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 2 in stage 0.0 failed 4 times, most recent failure: Lost
task 2.3 in stage 0.0
Updating the dependancy in SBT solved the problem.