livy

How to pull Spark jobs client logs submitted using Apache Livy batches POST method using AirFlow

家住魔仙堡 提交于 2020-02-25 05:38:05
问题 I am working on submitting Spark job using Apache Livy batches POST method. This HTTP request is send using AirFlow. After submitting job, I am tracking status using batch Id. I want to show driver ( client logs) logs on Air Flow logs to avoid going to multiple places AirFLow and Apache Livy/Resource Manager. Is this possible to do using Apache Livy REST API? 回答1: Livy has an endpoint to get logs /sessions/{sessionId}/log & /batches/{batchId}/log . Documentation: https://livy.incubator.apache

Submitting spark Jobs over livy using curl

旧时模样 提交于 2020-01-11 12:57:47
问题 I'm submitting spark jobs on a livy (0.6.0) session through Curl The jobs are a big jar file that extends the Job interface just exactly like this : https://stackoverflow.com/a/49220879/8557851 Actually when running this code using this curl command : curl -X POST -d '{"kind": "spark","files":["/config.json"],"jars":["/myjar.jar"],"driverMemory":"512M","executorMemory":"512M"}' -H "Content-Type: application/json" localhost:8998/sessions/ When it comes to the code it is exactly like the answer

How to convert Livy curl call to Livy Rest API call

我们两清 提交于 2020-01-06 07:53:59
问题 I am getting started with Livy, in my setup Livy server is running on Unix machine and I am able to do curl to it and execute the job. I have created a fat jar and uploaded it on hdfs and I am simply calling its main method from Livy. My Json payload for Livy looks like below: { "file" : "hdfs:///user/data/restcheck/spark_job_2.11-3.0.0-RC1- SNAPSHOT.jar", "proxyUser" : "test_user", "className" : "com.local.test.spark.pipeline.path.LivyTest", "files" : ["hdfs:///user/data/restcheck/hivesite

Timeout error: Error with 400 StatusCode: “requirement failed: Session isn't active.”

纵饮孤独 提交于 2020-01-06 04:26:08
问题 I'm using Zeppelin v0.7.3 notebook to run Pyspark scripts. In one paragraph, I am running script to write data from dataframe to a parquet file in a Blob folder. File is partitioned per country. Number of rows of dataframe is 99,452,829 . When the script reaches 1 hour , an error is encountered - Error with 400 StatusCode: "requirement failed: Session isn't active. My default interpreter for the notebook is jdbc . I have read about timeoutlifecyclemanager and added in the interpreter setting

how to set livy.server.session.timeout on EMR cluster boostrap?

大憨熊 提交于 2020-01-02 08:09:07
问题 I am creating an EMR cluster, and using jupyter notebook to run some spark tasks. My tasks die after approximately 1 hour of execution, and the error is: An error was encountered: Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active." My understanding is that it is related to the Livy config livy.server.session.timeout , but I don't know how I can set it in the bootstrap of the cluster (I need to do it

Invalid status code '400' from .. error payload: "requirement failed: Session isn't active

て烟熏妆下的殇ゞ 提交于 2019-12-31 03:54:12
问题 I am running Pyspark scripts to write a dataframe to a csv in jupyter Notebook as below: df.coalesce(1).write.csv('Data1.csv',header = 'true') After an hour of runtime I am getting the below error. Error: Invalid status code from http://.....session isn't active. My config is like: spark.conf.set("spark.dynamicAllocation.enabled","true") spark.conf.set("shuffle.service.enabled","true") spark.conf.set("spark.dynamicAllocation.minExecutors",6) spark.conf.set("spark.executor.heartbeatInterval",

Zeppelin 0.7.2 version does not support spark 2.2.0

你说的曾经没有我的故事 提交于 2019-12-24 03:15:55
问题 How to downgrade the spark version? What could be the other solutions? I have to connect my hive tables to spark using spark session. But the spark version is not supported by zeppelin. 回答1: Here are 2 reasons. [1] Zeppelin 0.7.2 marked spark 2.2+ as the unsupported version. https://github.com/apache/zeppelin/blob/v0.7.2/spark/src/main/java/org/apache/zeppelin/spark/SparkVersion.java#L40 public static final SparkVersion UNSUPPORTED_FUTURE_VERSION = SPARK_2_2_0; [2] Even if you change the

How to submit Spark jobs to Apache Livy?

≯℡__Kan透↙ 提交于 2019-12-23 04:41:58
问题 I am trying to understand how to submit Spark job to Apache Livy. I added the following API to my POM.xml: <dependency> <groupId>com.cloudera.livy</groupId> <artifactId>livy-api</artifactId> <version>0.3.0</version> </dependency> <dependency> <groupId>com.cloudera.livy</groupId> <artifactId>livy-scala-api_2.11</artifactId> <version>0.3.0</version> </dependency> Then I have the following code in Spark that I want to submit to Livy on request. import org.apache.spark.sql.{DataFrame,

Use existing SparkSession in POST/batches request

a 夏天 提交于 2019-12-20 04:24:05
问题 I'm trying to use Livy to remotely submit several Spark jobs . Lets say I want to perform following spark-submit task remotely (with all the options as-such) spark-submit \ --class com.company.drivers.JumboBatchPipelineDriver \ --conf spark.driver.cores=1 \ --conf spark.driver.memory=1g \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \ --conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \ --master yarn \ --deploy-mode

Livy REST Spark java.io.FileNotFoundException:

时光总嘲笑我的痴心妄想 提交于 2019-12-13 03:58:27
问题 I am newer in BigData, i have tried to call spark jobs with apache Livy . With submit command line works fine . with livy i have exception the command line : curl -X POST --data '{"file": "/user/romain/spark-examples.jar", "className": "org.apache.spark.examples.SparkPi"}' -H 'Content-Type: application/json' http://localhost:8998/batches Livy logs : 2019-06-01 00:43:19,160 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where