hue

Kaggle | IEEE Fraud Detection(EDA)

丶灬走出姿态 提交于 2020-02-08 07:55:53
IEEE Fraud Detection - EDA 1 Description In this competition, you’ll benchmark machine learning models on a challenging large-scale dataset. The data comes from Vesta’s real-world e-commerce transactions and contains a wide range of features from device type to product features. You also have the opportunity to create new features to improve your results. If successful, you’ll improve the efficacy of fraudulent transaction alerts for millions of people around the world, helping hundreds of thousands of businesses reduce their fraud loss and increase their revenue. And of course, you will save

Impala SQL: Merging rows with overlapping dates. WHERE EXISTS and recursive CTE not supported

徘徊边缘 提交于 2020-02-01 05:50:24
问题 I am trying to merge rows with overlapping date intervals in a table in Impala SQL. However the solutions I have found to solve this are not supported by Impala eg. WHERE EXISTS and recursive CTEs. How would I write a query for this in Impala? Table: @T ID StartDate EndDate 1 20170101 20170201 2 20170101 20170401 3 20170505 20170531 4 20170530 20170531 5 20170530 20170831 6 20171001 20171005 7 20171101 20171225 8 20171105 20171110 Required Output: StartDate EndDate 20170101 20170401 20170505

0475-如何统一Hue和Oozie的时区

我是研究僧i 提交于 2020-01-28 00:43:25
温馨提示:如果使用电脑查看图片不清晰,可以使用手机打开文章单击文中的图片放大查看高清原图。 Fayson的github: https://github.com/fayson/cdhproject 提示:代码块部分可以左右滑动查看噢 感谢群友rong和王峰提出问题并解决问题 1 文档编写目的 在前面的文章中,Fayson介绍过《 如何修改Cloudera Manager的时区 》,《 如何修改Hue的时区 》和《 如何修改CDSW会话的时区 》。在使用Hue创建调度任务的过程中,我们会发现Hue的时区与Oozie的调度时间不一致。默认Oozie使用的时区为UTC,在创建调度作业时还需要考虑在当前的时间减去8个小时才能达到我们的预期。在使用上非常不方便,这里Fayson主要介绍如何统一Hue和Oozie的时区。 测试环境 1.RedHat7.2 2.CM和CDH版本为5.15.0 2 设置Hue时区 Hue的默认时区为America/Los_Angeles,这里需要在CM上将Hue的时区修改为Asia/Shanghai。 1.登录Cloudera Manager进入Hue的配置页面搜索“time_zone” 2.将时区修改为Asia/Shanghai 保存配置,并重启Hue服务即可,以上完成Hue服务时区的设置。 3 修改Oozie时区 Oozie默认时区为UTC

Hue notebook 迁移异常Document does not exist or you don't have the permission to access it.

纵然是瞬间 提交于 2020-01-22 09:31:08
背景描述 cdh版本5.16.2,sentry+hive+hue,启用notebook 由于在集成sentry后,hue上的账号需要分离,原有用户hdfs的notebook需要迁移到其他的新用户。 登陆老用户的账户 可以直接使用hue界面的export功能,一次性导出所有notebook,如图 登陆新账户的用户,导入上一步下载的json文件,如图 一切看起来很顺利,但是在新用户打开导入的notebook时,会报错:Document does not exist or you don't have the permission to access it. 不啰嗦,直接上解决方法: 1.打开该notebook后,直接点击报错,这时会在documents界面存在两个同名的notebook,内容一致,根据时间删除老的notebook即可 2.从根本上分析这个问题,实际为json文件如下, [ { "pk": 408840, "model": "desktop.document2", "fields": { "search": "", "uuid": "7687b328-e9b7-4c53-a4fe-9c5edee4b1ed", "extra": "", "type": "notebook", "description": "", "is_history": false, "parent

Python图表数据可视化Seaborn:2. 分类数据可视化-分类散点图|分布图(箱型图|小提琴图|LV图表)|统计图(柱状图|折线图)

╄→尐↘猪︶ㄣ 提交于 2020-01-16 04:48:53
1. 分类数据可视化 - 分类散点图 stripplot( ) / swarmplot( ) sns.stripplot(x="day",y="total_bill",data=tips,jitter = True, size = 5, edgecolor = 'w',linewidth=1,marker = 'o') import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns % matplotlib inline sns.set_style("whitegrid") sns.set_context("paper") # 设置风格、尺度 import warnings warnings.filterwarnings('ignore') # 不发出警告 # 1、stripplot() # 按照不同类别对样本数据进行分布散点图绘制 tips = sns.load_dataset("tips") print(tips.head()) # 加载数据print(tips['day'].value_counts()) sns.stripplot(x="day", # x → 设置分组统计字段 y="total_bill", # y → 数据分布统计字段 #

python画图

匆匆过客 提交于 2020-01-15 05:53:20
正弦图像: #coding:utf-8import numpy as npimport matplotlib.pyplot as pltx=np.linspace(0,10,1000)y=np.sin(x)z=np.cos(x**2)#控制图形的长和宽单位为英寸,# 调用figure创建一个绘图对象,并且使它成为当前的绘图对象。plt.figure(figsize=(8,4))#$可以让字体变得跟好看#给所绘制的曲线一个名字,此名字在图示(legend)中显示。# 只要在字符串前后添加"$"符号,matplotlib就会使用其内嵌的latex引擎绘制的数学公式。#color : 指定曲线的颜色#linewidth : 指定曲线的宽度plt.plot(x,y,label="$sin(x)$",color="red",linewidth=2)#b-- 曲线的颜色和线型plt.plot(x,z,"b--",label="$cos(x^2)$")#设置X轴的文字plt.xlabel("Time(s)")#设置Y轴的文字plt.ylabel("Volt")#设置图表的标题plt.title("PyPlot First Example")#设置Y轴的范围plt.ylim(-1.2,1.2)#显示图示plt.legend()#显示出我们创建的所有绘图对象。plt.show() 配置

Running Oozie actions in parallel

拜拜、爱过 提交于 2020-01-14 01:12:09
问题 I am using the workflow editor in Hue to develop an Oozie workflow. There are a few action that should be executed in parallel. Is it possible to execute two or more actions concurrently? How can I set it up in Hue? 回答1: Yes, it is possible. Among various Oozie workflow nodes, there are two control nodes fork and join : A fork node splits one path of execution into multiple concurrent paths of execution. A join node waits until every concurrent execution path of a previous fork node arrives

Sqoop Free-Form Query Causing Unrecognized Arguments in Hue/Oozie

不羁岁月 提交于 2020-01-13 19:49:47
问题 I am attempting to run a sqoop command with a free-form query, because I need to perform an aggregation. It's being submitted via the Hue interface, as an Oozie workflow. The following is a scaled-down version of the command and query. When the command is processed, the "--query" statement (enclosed in quotes) results in each portion of the query to be interpreted as unrecognized arguments, as shown in the error following the command. In addition, the target directory is being misinterpreted.

Sqoop Free-Form Query Causing Unrecognized Arguments in Hue/Oozie

こ雲淡風輕ζ 提交于 2020-01-13 19:49:31
问题 I am attempting to run a sqoop command with a free-form query, because I need to perform an aggregation. It's being submitted via the Hue interface, as an Oozie workflow. The following is a scaled-down version of the command and query. When the command is processed, the "--query" statement (enclosed in quotes) results in each portion of the query to be interpreted as unrecognized arguments, as shown in the error following the command. In addition, the target directory is being misinterpreted.

CDH Hue中 Hive 或 Impala 一直连接不释放资源

一笑奈何 提交于 2020-01-10 07:42:50
Hive - 配置 - HiveServer2 - hive-site.xml 添加会话超时,但要注意会话超时后临时udf会失效。 <property><name>hive.server2.session.check.interval</name><value>3000</value></property> <property><name>hive.server2.idle.session.timeout</name><value>0</value></property> <property><name>hive.server2.idle.operation.timeout</name><value>0</value></property> 来源: CSDN 作者: 南宫紫攸 链接: https://blog.csdn.net/weixin_45353054/article/details/103913319