Triggering spark jobs with REST

a 夏天 提交于 2019-11-27 00:37:07

问题


I have been of late trying out apache spark. My question is more specific to trigger spark jobs. Here I had posted question on understanding spark jobs. After getting dirty on jobs I moved on to my requirement.

I have a REST end point where I expose API to trigger Jobs, I have used Spring4.0 for Rest Implementation. Now going ahead I thought of implementing Jobs as Service in Spring where I would submit Job programmatically, meaning when the endpoint is triggered, with given parameters I would trigger the job. I have now few design options.

  • Similar to the below written job, I need to maintain several Jobs called by a Abstract Class may be JobScheduler .

     /*Can this Code be abstracted from the application and written as 
      as a seperate job. Because my understanding is that the 
     Application code itself has to have the addJars embedded 
     which internally  sparkContext takes care.*/
    
     SparkConf sparkConf = new SparkConf().setAppName("MyApp").setJars(
     new String[] { "/path/to/jar/submit/cluster" })
     .setMaster("/url/of/master/node");
      sparkConf.setSparkHome("/path/to/spark/");
    
            sparkConf.set("spark.scheduler.mode", "FAIR");
            JavaSparkContext sc = new JavaSparkContext(sparkConf);
            sc.setLocalProperty("spark.scheduler.pool", "test");
    
        // Application with Algorithm , transformations
    
  • extending above point have multiple versions of jobs handled by service.

  • Or else use an Spark Job Server to do this.

Firstly, I would like to know what is the best solution in this case, execution wise and also scaling wise.

Note : I am using a standalone cluster from spark. kindly help.


回答1:


Just use the Spark JobServer https://github.com/spark-jobserver/spark-jobserver

There are a lot of things to consider with making a service, and the Spark JobServer has most of them covered already. If you find things that aren't good enough, it should be easy to make a request and add code to their system rather than reinventing it from scratch




回答2:


It turns out Spark has a hidden REST API to submit a job, check status and kill.

Check out full example here: http://arturmkrtchyan.com/apache-spark-hidden-rest-api




回答3:


Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.




回答4:


Here is a good client that you might find helpful: https://github.com/ywilkof/spark-jobs-rest-client

Edit: this answer was given in 2015. There are options like Livy available now.



来源:https://stackoverflow.com/questions/28992802/triggering-spark-jobs-with-rest

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!