livy

How to check spark config for an application in Ambari UI, posted with livy

五迷三道 提交于 2019-12-11 17:00:23
问题 I am posting jobs to a spark cluster using livy APIs. I want to increase the spark.network.timeout value and passing the same value ( 600s ) with the conf field in livy post call. How can I verify that it is getting correctly honoured and getting applied to the jobs posted? 来源: https://stackoverflow.com/questions/55690915/how-to-check-spark-config-for-an-application-in-ambari-ui-posted-with-livy

Apache Livy cURL not working for spark-submit command

白昼怎懂夜的黑 提交于 2019-12-08 20:05:39
I recently started working with Spark Scala, HDFS, sbt and Livy. Currently I tried to create livy batch. Warning: Skip remote jar hdfs://localhost:9001/jar/project.jar. java.lang.ClassNotFoundException: SimpleApp at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:225) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy

Livy Server: return a dataframe as JSON?

∥☆過路亽.° 提交于 2019-12-08 16:33:33
问题 I am executing a statement in Livy Server using HTTP POST call to localhost:8998/sessions/0/statements , with the following body { "code": "spark.sql(\"select * from test_table limit 10\")" } I would like an answer in the following format (...) "data": { "application/json": "[ {"id": "123", "init_date": 1481649345, ...}, {"id": "133", "init_date": 1481649333, ...}, {"id": "155", "init_date": 1481642153, ...}, ]" } (...) but what I'm getting is (...) "data": { "text/plain": "res0: org.apache

How to set Spark configuration properties using Apache Livy?

风格不统一 提交于 2019-12-08 12:54:43
问题 I don't know how to pass SparkSession parameters programmatically when submitting Spark job to Apache Livy: This is the Test Spark job: class Test extends Job[Int]{ override def call(jc: JobContext): Int = { val spark = jc.sparkSession() // ... } } This is how this Spark job is submitted to Livy: val client = new LivyClientBuilder() .setURI(new URI(livyUrl)) .build() try { client.uploadJar(new File(testJarPath)).get() client.submit(new Test()) } finally { client.stop(true) } How can I pass

Apache Livy cURL not working for spark-submit command

被刻印的时光 ゝ 提交于 2019-12-08 06:42:24
问题 I recently started working with Spark Scala, HDFS, sbt and Livy. Currently I tried to create livy batch. Warning: Skip remote jar hdfs://localhost:9001/jar/project.jar. java.lang.ClassNotFoundException: SimpleApp at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark

Spark job submission using Airflow by submitting batch POST method on Livy and tracking job

女生的网名这么多〃 提交于 2019-12-08 01:34:36
问题 I want to use Airflow for orchestration of jobs that includes running some pig scripts, shell scripts and spark jobs. Mainly on Spark jobs, I want to use Apache Livy but not sure whether it is good idea to use or run spark-submit. What is best way to track Spark job using Airflow if even I submitted? 回答1: My assumption is you an application JAR containing Java / Scala code that you want to submit to remote Spark cluster. Livy is arguably the best option for remote spark-submit when evaluated

how to set livy.server.session.timeout on EMR cluster boostrap?

这一生的挚爱 提交于 2019-12-05 19:02:12
I am creating an EMR cluster, and using jupyter notebook to run some spark tasks. My tasks die after approximately 1 hour of execution, and the error is: An error was encountered: Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active." My understanding is that it is related to the Livy config livy.server.session.timeout , but I don't know how I can set it in the bootstrap of the cluster (I need to do it in the bootstrap because the cluster is created with no ssh access) Thanks a lot in advance On EMR,

Use existing SparkSession in POST/batches request

对着背影说爱祢 提交于 2019-12-02 04:01:23
I'm trying to use Livy to remotely submit several Spark jobs . Lets say I want to perform following spark-submit task remotely (with all the options as-such) spark-submit \ --class com.company.drivers.JumboBatchPipelineDriver \ --conf spark.driver.cores=1 \ --conf spark.driver.memory=1g \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \ --conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \ --master yarn \ --deploy-mode cluster \ /home/hadoop/y2k-shubham/jars/jumbo-batch.jar \ \ --start=2012-12-21 \ --end=2012-12-21 \ -