livy | 易学教程

How to pull Spark jobs client logs submitted using Apache Livy batches POST method using AirFlow

阅读更多关于 How to pull Spark jobs client logs submitted using Apache Livy batches POST method using AirFlow

问题 I am working on submitting Spark job using Apache Livy batches POST method. This HTTP request is send using AirFlow. After submitting job, I am tracking status using batch Id. I want to show driver ( client logs) logs on Air Flow logs to avoid going to multiple places AirFLow and Apache Livy/Resource Manager. Is this possible to do using Apache Livy REST API? 回答1: Livy has an endpoint to get logs /sessions/{sessionId}/log & /batches/{batchId}/log . Documentation: https://livy.incubator.apache

Submitting spark Jobs over livy using curl

阅读更多关于 Submitting spark Jobs over livy using curl

问题 I'm submitting spark jobs on a livy (0.6.0) session through Curl The jobs are a big jar file that extends the Job interface just exactly like this : https://stackoverflow.com/a/49220879/8557851 Actually when running this code using this curl command : curl -X POST -d '{"kind": "spark","files":["/config.json"],"jars":["/myjar.jar"],"driverMemory":"512M","executorMemory":"512M"}' -H "Content-Type: application/json" localhost:8998/sessions/ When it comes to the code it is exactly like the answer

How to convert Livy curl call to Livy Rest API call

阅读更多关于 How to convert Livy curl call to Livy Rest API call

问题 I am getting started with Livy, in my setup Livy server is running on Unix machine and I am able to do curl to it and execute the job. I have created a fat jar and uploaded it on hdfs and I am simply calling its main method from Livy. My Json payload for Livy looks like below: { "file" : "hdfs:///user/data/restcheck/spark_job_2.11-3.0.0-RC1- SNAPSHOT.jar", "proxyUser" : "test_user", "className" : "com.local.test.spark.pipeline.path.LivyTest", "files" : ["hdfs:///user/data/restcheck/hivesite

Timeout error: Error with 400 StatusCode: “requirement failed: Session isn't active.”

阅读更多关于 Timeout error: Error with 400 StatusCode: “requirement failed: Session isn't active.”

问题 I'm using Zeppelin v0.7.3 notebook to run Pyspark scripts. In one paragraph, I am running script to write data from dataframe to a parquet file in a Blob folder. File is partitioned per country. Number of rows of dataframe is 99,452,829 . When the script reaches 1 hour , an error is encountered - Error with 400 StatusCode: "requirement failed: Session isn't active. My default interpreter for the notebook is jdbc . I have read about timeoutlifecyclemanager and added in the interpreter setting

how to set livy.server.session.timeout on EMR cluster boostrap?

阅读更多关于 how to set livy.server.session.timeout on EMR cluster boostrap?

问题 I am creating an EMR cluster, and using jupyter notebook to run some spark tasks. My tasks die after approximately 1 hour of execution, and the error is: An error was encountered: Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active." My understanding is that it is related to the Livy config livy.server.session.timeout , but I don't know how I can set it in the bootstrap of the cluster (I need to do it

Invalid status code '400' from .. error payload: "requirement failed: Session isn't active

阅读更多关于 Invalid status code '400' from .. error payload: "requirement failed: Session isn't active

问题 I am running Pyspark scripts to write a dataframe to a csv in jupyter Notebook as below: df.coalesce(1).write.csv('Data1.csv',header = 'true') After an hour of runtime I am getting the below error. Error: Invalid status code from http://.....session isn't active. My config is like: spark.conf.set("spark.dynamicAllocation.enabled","true") spark.conf.set("shuffle.service.enabled","true") spark.conf.set("spark.dynamicAllocation.minExecutors",6) spark.conf.set("spark.executor.heartbeatInterval",

Zeppelin 0.7.2 version does not support spark 2.2.0

阅读更多关于 Zeppelin 0.7.2 version does not support spark 2.2.0

问题 How to downgrade the spark version? What could be the other solutions? I have to connect my hive tables to spark using spark session. But the spark version is not supported by zeppelin. 回答1: Here are 2 reasons. [1] Zeppelin 0.7.2 marked spark 2.2+ as the unsupported version. https://github.com/apache/zeppelin/blob/v0.7.2/spark/src/main/java/org/apache/zeppelin/spark/SparkVersion.java#L40 public static final SparkVersion UNSUPPORTED_FUTURE_VERSION = SPARK_2_2_0; [2] Even if you change the

How to submit Spark jobs to Apache Livy?

阅读更多关于 How to submit Spark jobs to Apache Livy?

问题 I am trying to understand how to submit Spark job to Apache Livy. I added the following API to my POM.xml: <dependency> <groupId>com.cloudera.livy</groupId> <artifactId>livy-api</artifactId> <version>0.3.0</version> </dependency> <dependency> <groupId>com.cloudera.livy</groupId> <artifactId>livy-scala-api_2.11</artifactId> <version>0.3.0</version> </dependency> Then I have the following code in Spark that I want to submit to Livy on request. import org.apache.spark.sql.{DataFrame,

Use existing SparkSession in POST/batches request

阅读更多关于 Use existing SparkSession in POST/batches request

问题 I'm trying to use Livy to remotely submit several Spark jobs . Lets say I want to perform following spark-submit task remotely (with all the options as-such) spark-submit \ --class com.company.drivers.JumboBatchPipelineDriver \ --conf spark.driver.cores=1 \ --conf spark.driver.memory=1g \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \ --conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \ --master yarn \ --deploy-mode

Livy REST Spark java.io.FileNotFoundException:

阅读更多关于 Livy REST Spark java.io.FileNotFoundException:

问题 I am newer in BigData, i have tried to call spark jobs with apache Livy . With submit command line works fine . with livy i have exception the command line : curl -X POST --data '{"file": "/user/romain/spark-examples.jar", "className": "org.apache.spark.examples.SparkPi"}' -H 'Content-Type: application/json' http://localhost:8998/batches Livy logs : 2019-06-01 00:43:19,160 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where