oozie-workflow

Workflow scheduling on GCP Dataproc cluster

落爺英雄遲暮 提交于 2020-02-24 03:56:08
问题 I have some complex Oozie workflows to migrate from on-prem Hadoop to GCP Dataproc. Workflows consist of shell-scripts, Python scripts, Spark-Scala jobs, Sqoop jobs etc. I have come across some potential solutions incorporating my workflow scheduling needs: Cloud Composer Dataproc Workflow Template with Cloud Scheduling Install Oozie on Dataproc auto-scaling cluster Please let me know which option would be most efficient in terms of performance, costing and migration complexities. 回答1: All 3

“Beeline command not found” error while executing beeline command from python script (called from oozie shell action)

爷,独闯天下 提交于 2020-01-25 09:50:05
问题 I have a python script that I want to schedule using oozie. I am using Oozie shell action for invoking the script. There is a beeline command in the script. When I run the oozie workflow, I get error "sh: beeline: command not found" . If I run this script or just the beeline command manually from edge node, it runs perfectly fine. My data platform is Hortonworks 2.6. Below is my workflow.xml and python script: Workflow.xml <workflow-app xmlns="uri:oozie:workflow:0.3" name="hive2-wf">

adding multiple jars in Oozie-Spark action

本秂侑毒 提交于 2019-12-24 08:30:13
问题 I'm using HDP2.6. where is installed oozie 4.2. and Spark2. After I tracked Hortonworks guide on this site: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-spark-action.html for adding libs for Spark2 in 4.2. version of Oozie. After I submit the job with this add-on: oozie.action.sharelib.for.spark=spark2 The error I'm getting is this: 2017-07-19 12:36:53,271 WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2]

Can Apache Oozie run docker containers?

你。 提交于 2019-12-23 20:22:37
问题 Currently comparing DAG-based workflow tools like Airflow and Luigi for scheduling generic docker containers as well as Spark jobs. Can Apache Oozie run generic Docker containers through its shell action? Or is Oozie strictly meant for Hadoop tools like Pig and Hive? Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as

OOZIE : Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]

邮差的信 提交于 2019-12-08 11:19:27
问题 I'm trying to execute Oozie job with the help of URL: https://www.safaribooksonline.com/library/view/apache-oozie/9781449369910/ch05.html While executing oozie job -run -config target/example/job.properties Getting error as : Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 1 sec. Retry count = 1 Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 2 sec. Retry

Oozie not sending SLA email alerts

♀尐吖头ヾ 提交于 2019-11-27 08:47:51
问题 I used this link from oozie documentation to setup SLAs for my oozie workflow. I then scheduled a job which ran longer than the defined SLAs. However, I am not getting any email alerts for SLA miss from oozie. Any idea on how I should debug it? Thank you! 来源: https://stackoverflow.com/questions/57281650/oozie-not-sending-sla-email-alerts