How to submit Spark jobs to Apache Livy?

≯℡__Kan透↙ 提交于 2019-12-23 04:41:58

问题


I am trying to understand how to submit Spark job to Apache Livy.

I added the following API to my POM.xml:

 <dependency>
     <groupId>com.cloudera.livy</groupId>
     <artifactId>livy-api</artifactId>
     <version>0.3.0</version>
 </dependency>

 <dependency>
     <groupId>com.cloudera.livy</groupId>
     <artifactId>livy-scala-api_2.11</artifactId>
     <version>0.3.0</version>
 </dependency>

Then I have the following code in Spark that I want to submit to Livy on request.

import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._

object Test {

  def main(args: Array[String]) {

    val spark = SparkSession.builder()
                            .appName("Test")
                            .master("local[*]")
                            .getOrCreate()


    import spark.sqlContext.implicits._

    implicit val sparkContext = spark.sparkContext

    // ...
  }
}

To have the following code that creates a LivyClient instance and uploads the application code to the Spark context:

val client = new LivyClientBuilder()
  .setURI(new URI(livyUrl))
  .build()

try {
  client.uploadJar(new File(testJarPath)).get()

  client.submit(new Test())

} finally {
  client.stop(true)
}

However, the problem is that the code of Test is not adapted to be used with Apache Livy.

How can I adjust the code of Test object in order to be able to run client.submit(new Test())?


回答1:


Your Test class needs to implement Livy's Job interface and you need to implement its call method in your Test class, from where you will get access to jobContext/SparkContext. You can then pass the instance of Test in the submit method

You don't have to create SparkSession by yourself, Livy will create it on the cluster and you can access that context in your call method.

You can find more detailed information on Livy's programmatic API here: https://livy.incubator.apache.org/docs/latest/programmatic-api.html

Here's a sample implementation of Test Class:

import com.cloudera.livy.{Job, JobContext}

class Test extends Job[Int]{

  override def call(jc: JobContext): Int = {

    val spark = jc.sparkSession()

    // Do anything with SparkSession

    1 //Return value
  }
}


来源:https://stackoverflow.com/questions/49220452/how-to-submit-spark-jobs-to-apache-livy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!