livy | 易学教程

How to check spark config for an application in Ambari UI, posted with livy

阅读更多关于 How to check spark config for an application in Ambari UI, posted with livy

问题 I am posting jobs to a spark cluster using livy APIs. I want to increase the spark.network.timeout value and passing the same value ( 600s ) with the conf field in livy post call. How can I verify that it is getting correctly honoured and getting applied to the jobs posted? 来源： https://stackoverflow.com/questions/55690915/how-to-check-spark-config-for-an-application-in-ambari-ui-posted-with-livy

Apache Livy cURL not working for spark-submit command

阅读更多关于 Apache Livy cURL not working for spark-submit command

I recently started working with Spark Scala, HDFS, sbt and Livy. Currently I tried to create livy batch. Warning: Skip remote jar hdfs://localhost:9001/jar/project.jar. java.lang.ClassNotFoundException: SimpleApp at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:225) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy

Livy Server: return a dataframe as JSON?

阅读更多关于 Livy Server: return a dataframe as JSON?

问题 I am executing a statement in Livy Server using HTTP POST call to localhost:8998/sessions/0/statements , with the following body { "code": "spark.sql(\"select * from test_table limit 10\")" } I would like an answer in the following format (...) "data": { "application/json": "[ {"id": "123", "init_date": 1481649345, ...}, {"id": "133", "init_date": 1481649333, ...}, {"id": "155", "init_date": 1481642153, ...}, ]" } (...) but what I'm getting is (...) "data": { "text/plain": "res0: org.apache

How to set Spark configuration properties using Apache Livy?

阅读更多关于 How to set Spark configuration properties using Apache Livy?

问题 I don't know how to pass SparkSession parameters programmatically when submitting Spark job to Apache Livy: This is the Test Spark job: class Test extends Job[Int]{ override def call(jc: JobContext): Int = { val spark = jc.sparkSession() // ... } } This is how this Spark job is submitted to Livy: val client = new LivyClientBuilder() .setURI(new URI(livyUrl)) .build() try { client.uploadJar(new File(testJarPath)).get() client.submit(new Test()) } finally { client.stop(true) } How can I pass

Apache Livy cURL not working for spark-submit command

阅读更多关于 Apache Livy cURL not working for spark-submit command

问题 I recently started working with Spark Scala, HDFS, sbt and Livy. Currently I tried to create livy batch. Warning: Skip remote jar hdfs://localhost:9001/jar/project.jar. java.lang.ClassNotFoundException: SimpleApp at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark

Spark job submission using Airflow by submitting batch POST method on Livy and tracking job

阅读更多关于 Spark job submission using Airflow by submitting batch POST method on Livy and tracking job

问题 I want to use Airflow for orchestration of jobs that includes running some pig scripts, shell scripts and spark jobs. Mainly on Spark jobs, I want to use Apache Livy but not sure whether it is good idea to use or run spark-submit. What is best way to track Spark job using Airflow if even I submitted? 回答1: My assumption is you an application JAR containing Java / Scala code that you want to submit to remote Spark cluster. Livy is arguably the best option for remote spark-submit when evaluated

how to set livy.server.session.timeout on EMR cluster boostrap?

阅读更多关于 how to set livy.server.session.timeout on EMR cluster boostrap?

I am creating an EMR cluster, and using jupyter notebook to run some spark tasks. My tasks die after approximately 1 hour of execution, and the error is: An error was encountered: Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active." My understanding is that it is related to the Livy config livy.server.session.timeout , but I don't know how I can set it in the bootstrap of the cluster (I need to do it in the bootstrap because the cluster is created with no ssh access) Thanks a lot in advance On EMR,

Use existing SparkSession in POST/batches request

阅读更多关于 Use existing SparkSession in POST/batches request

I'm trying to use Livy to remotely submit several Spark jobs . Lets say I want to perform following spark-submit task remotely (with all the options as-such) spark-submit \ --class com.company.drivers.JumboBatchPipelineDriver \ --conf spark.driver.cores=1 \ --conf spark.driver.memory=1g \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \ --conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \ --master yarn \ --deploy-mode cluster \ /home/hadoop/y2k-shubham/jars/jumbo-batch.jar \ \ --start=2012-12-21 \ --end=2012-12-21 \ -