================================================================================
背景:SQL监控接入dpc,日期为云自带的函数,但在本地odps调试时候不可以走云函数,需要自己获取当前时间-1,格式为yyyymmdd 如20191213
mysql中的DATE_FORMAT(NOW(),'%Y-%m-%d')函数
前一天日期
DATE_FORMAT(adddate(now(),-1),'%Y%m%d')
1.格式:
DATE_FORMAT(date,format) 函数用于显示日期或时间数据的不同样式。
1.1参数:date 合法的日期;
format 最终输出的日期/时间;
2.参考:
DATE_FORMAT(NOW(),’%Y-%m-%d’) 格式转换
SELECT DATE_FORMAT(NOW(),'%Y-%m-%d') AS '日期'
输出格式为2019-12-12
如果需要20191212格式的日期,则
mysql> select date_format(current_timestamp, '%Y%m%d')
-> ;
+------------------------------------------+
| date_format(current_timestamp, '%Y%m%d') |
+------------------------------------------+
| 20191224 |
+------------------------------------------+
该死的odps不支持mysql函数
select DATEADD(GETDATE(), -1, 'dd') from database.table_name limit 1;
select to_char(DATEADD(GETDATE(), -1, 'dd'),'yyyymmdd') from database.table_name limit 1;
但是第二个SQL不行,因为我最后要的是20191212这种的而第一个函数转出来是20191212 15:11:20这种的格式导致结果显示为yy1212
原因:''错误导致 应为“” 换成下面的就对了
select to_char(dateadd(GETDATE(),-1,'dd'),"yyyymmdd") from database.table_name limit 1;
第二种SQL为
select replace(split_part(dateadd(from_unixtime(unix_timestamp()),-1,'dd')," ",1),"-","") from database.table_name
思考:mysql一个函数搞定,odps需要多个函数,比较慢和烦人~~不能只用平台自带的配置调用配置ds='${bizdate}'
1技术会废掉 2出去面试找不到工作 3不要用傻瓜式配置,离开平台将没得学习和提高,还是要多思考。本地调试写死日期不明智,因为数据表会保留一周的数据,考虑到可用性和易维护性,还是写成函数吧,不能图省事,要看的长远~
最终用到的SQL如下
===========================================================================
背景:
业务线埋点业务复杂,正则埋点130条+,基于UT平台经常断开链接以及每次回归成本较大,线上无监控,等着BI T+1发现问题的时候就晚了,对大盘数据造成影响。且QA最了解搜索埋点的不同场景的字段,基于odps做了埋点的监控S1测试方式:
1. 搜索埋点曝光case 21 条,点击case 64 条
其中,曝光case 1条为1-多个卡片,点击case1条为多个卡片(114个case)
// 统计mysql中一个字段的value中K出现的次数,如value值为 K3-22,K8-9,K10-2查看详情-二级页,查找K出现的次数,每个K是一个case
SELECT widget_name, sum(LENGTH(widget_name) - LENGTH( REPLACE(widget_name,'K','')))
from table_name where event_id=2101 and pd_emp_id in ("11","12","13","14","15" ,"16") ORDER BY gmt_modified DESC
1.1 曝光case校验字段:soku_test_ab、engine、item_log、aaid、k、track_info、source_from、search_from
具体规则如下,示例:
track_info 下的 soku_test_ab 正则规则 [a-z]{1}
track_info 下的 engine 正则规则 \S*
track_info 下的 item_log 正则规则 \S*
track_info 下的 k 正则规则 \S*
track_info 下的 aaid 正则规则 [0-9a-z]{32}
track_info 下的 source_from 正则规则 (home|discover|vip)
track_info 下的 search_from 正则规则 ^[1-9]\d?|1[01]\d|1[01]\d|^10trackinfo不为空spm正则规则a2h0c.8166622.PhoneXXPictureTab\d∗.channeltab\d∗(;|,)∗scm正则规则20140669.search.rgroupset.filter§∗1.2点击case校验字段:sokutestab、engine、itemlog、aaid、k、trackinfo、spm、scm结果页:trackinfo下的sokutestab正则规则[a−z]1trackinfo下的engine正则规则§∗trackinfo下的itemlog正则规则§∗trackinfo下的sourcefrom正则规则(home|discover|vip)trackinfo下的searchfrom正则规则[1−9]\d?trackinfo不为空spm正则规则a2h0c.8166622.PhoneXXPictureTab\d∗.channeltab\d∗(;|,)∗scm正则规则20140669.search.rgroupset.filter§∗1.2点击case校验字段:sokutestab、engine、itemlog、aaid、k、trackinfo、spm、scm结果页:trackinfo下的sokutestab正则规则[a−z]1trackinfo下的engine正则规则§∗trackinfo下的itemlog正则规则§∗trackinfo下的sourcefrom正则规则(home|discover|vip)trackinfo下的searchfrom正则规则[1−9]\d?|^1[01]\d|10|10
track_info 不为空
spm 正则规则 a2h0c.8166622.PhoneXXProgramSeries_\d*.poster_\d*
scm 正则规则 20140669.search.\S*.\S*
默认页:
track_info 下的 soku_test_ab 正则规则 [a-z]{1}
track_info 下的 aaid 正则规则 [0-9a-zA-Z]{32}
track_info 下的 k 正则规则 \S*
track_info 正则规则 \S*
spm 正则规则 a2h0c.8166619.PhoneXXOperate.clearbutton
scm 正则规则 20140669.search.searcharea.clearbutton
- 进度: 曝光埋点需要增加aaid k,进度:100% 曝光埋点新增soku_test_ab、engine、item_log,完成100%
点击埋点新增soku_test_ab、engine、item_log字段,完成100%
共85条case,128个字段
-
分工:
曝光埋点
点击埋点 -
测试手段:
每个迭代版本一灰前覆盖测试所有点击 曝光埋点
通过正则平台筛选业务线,勾选case生成测试方案,手动运行,生成测试报告(成功、失败、未运行的case)
测试完毕QA发出测试报告 -
埋点涉及到的各种平台
埋点日志平台:抓取埋点实时日志
XX:埋点正则case地址
FBI监控平台:关注埋点监控日报
埋点数据监控平台
OneData报警平台:全部点击曝光case已接入,但暂不支持正则报警,待RD完善 -
埋点接入正则case
埋点监控以及报警 -
埋点相关数据表调研
- 埋点所有事件表,曝光是没拆的原始日志,是离线表。实时表在特斯拉,很危险,一天上千亿数据
- 埋点日志15分钟延迟表
- 埋点小时表
- 底层表T+1日志
select alldata.allcnt, faildata.failcnt, round(faildata.failcnt*100.0/alldata.allcnt,4) as failratio
from(
( select count(*) as failcnt,'trackInfo' as question
from (
SELECT a.* ,
get_json_object(a.track_info,'$.show_q') as show_q ,
get_json_object(a.track_info,'$.search_q') as search_q
from
(
select *
FROM xx.xx
WHERE ds=20191216 and site='xx' and (device='android' )
and (original_spm='a2h0c.8166622.home.default' or
original_spm='a2h0c.8166619.xx.default')
and app_version>'8.0.0'
)a
)b
WHERE
b.req_id is null or trim(b.req_id)=''
or b.show_q is null or trim(b.show_q)=''
or b.search_q is null or trim(b.search_q)=''
or b.recext is NULL or trim(b.recext)=''
or b.aaid is NULL or trim(b.aaid)=''
or b.alginfo is null or trim(b.alginfo)=''
)faildata
LEFT JOIN (
select count(*) as allcnt, 'trackInfo' as question
FROM xx.xx
WHERE ds=20191210 and site='xx' and (device='android' )
and (original_spm='a2h0c.8166622.home.default' or
original_spm='a2h0c.8166619.xx.default' )
)alldata
on alldata.question=faildata.question
);
计算一个带条件和不带条件的比例,改为用sum(case when 带条件 then 1 else 0),count(1) 然后再把两个字段作除法
就是
SELECT
SUM(
CASE parent_id
WHEN 0 THEN
'00'
WHEN 1 THEN
'11'
ELSE
'OTHERS'
END ) AS parent_id_new
2019-12-18日改版
select spm, count(distinct case when aaid is null then aaid else null end) over () , count(k) ,
group by spm
sum(expo) over (partition by a.dim) expo_all
或者
a b c d
select
*, a/all as rate
from (
select count(if(error<>'-1',1,null)) as a
, count(1) as all
, count(if(error='a',1,null)) as a1
-- select sum(if(error<>'-1',1,0)),sum(1),
from (
select *,
case when a=null then 'a'
when b=null then 'b'
else '-1'
end as error
) a
)
优化前的SQL:
设计上:
mysql也支持聚合的时候加一些条件,不过一般都是数据分析师才会这么搞,或者BI统计的时候用。日常这么写SQL,要被DBA干死的。因为这样很消耗Mysql的cpu,计算也一般都很慢,在线业务跑这种SQL,那接口几秒钟能返回也是够快了,随便几个并发起来了,库都要被拖挂了。
业务上:SQL查询的是A or B or C or D为空的总和计算/total_log,接入监控报警,一旦报警,无法准确定位是哪个字段出错了,可能是A可能是B可能是C,因为计算的是总和出错率。到时候还需要把SQL粘贴到odps,逐一修改判断哪个字段为空
优化后的SQL:
分别计算A为空B为空C为空D为空的出错率,因为最终的监控只能监控一个字段,所以需要sum(*)输出一个值接入监控系统
校验1个场景下的10个字段,统计,占比,如果哪个字段漏掉了,排查的时候,只需要把监控SQL粘贴到odps去掉sum(*),清晰地看到ABCD各自的失败率占比。哪个字段为空,准确定位
不需要分多个规则配置,不需要分端,减少冗余无效的复制粘贴以及一堆规则的填写,精简化,报错明显
--odps sql
--********************************************************************--
--author:东方
--create time:2019-12-24 16:13:23
--********************************************************************--
--以下SQL的as也可以去掉
-- //通用需求柏拉图必须存在的key的校验
-- public static String[] mustExistKeyOuter = {"spm", "scm"};
-- public static String[] mustExistKeyInner = {"soku_test_ab", "engine", "item_log", "aaid", "k", "source_from", "search_from"};
-- set odps.sql.type.system.odps2=true;
-- select date_format(CURRENT_TIMESTAMP(),"%Y%M%D") from dual;
select
-- 以下device,app_version,total_failrate在本地运行时为了查看详细数据,需要打开,在接入dqc时需要注释掉
-- device,app_version,
round(spm_rate +scm_rate +track_info_rate+ soku_test_ab_rate+ engine_rate+ item_log_rate+ aaid_rate+ k_rate+ source_from_rate+ search_from_rate ,4)
as total_failrate FROM (select
sum( case when spm is null or trim(spm)='' then 1 else 0 end) as spm_null, sum(1) as log_total,
round(sum( case when spm is null or trim(spm)='' then 1 else 0 end) /sum(1) ,4) as spm_rate,
sum( case when scm is null or trim(scm)='' then 1 else 0 end) as scm_null,
round(sum( case when scm is null or trim(scm)='' then 1 else 0 end) /sum(1),4) as scm_rate,
sum( case when track_info is null or trim(track_info)='' then 1 else 0 end) as track_info_null,
round(sum( case when track_info is null or trim(track_info)='' then 1 else 0 end) /sum(1),4) as track_info_rate,
sum(case when get_json_object(track_info,'$.soku_test_ab') is null or trim(get_json_object(track_info,'$.soku_test_ab'))='' then 1 else 0 end) as soku_test_ab_null,
round(sum( case when get_json_object(track_info,'$.soku_test_ab') is null or trim(get_json_object(track_info,'$.soku_test_ab'))='' then 1 else 0 end) /sum(1),4) as soku_test_ab_rate,
sum( case when get_json_object(track_info,'$.engine') is null or trim(get_json_object(track_info,'$.engine'))='' then 1 else 0 end) as engine_null,
round(sum( case when get_json_object(track_info,'$.engine') is null or trim(get_json_object(track_info,'$.engine'))='' then 1 else 0 end) /sum(1),4) as engine_rate,
sum( case when get_json_object(track_info,'$.item_log') is null or trim(get_json_object(track_info,'$.item_log'))='' then 1 else 0 end) as item_log_null,
round(sum( case when get_json_object(track_info,'$.item_log') is null or trim(get_json_object(track_info,'$.item_log'))=''then 1 else 0 end) /sum(1),4) item_log_rate,
sum( case when get_json_object(track_info,'$.aaid') is null or trim(get_json_object(track_info,'$.aaid'))='' then 1 else 0 end) as aaid_null,
round(sum( case when get_json_object(track_info,'$.aaid') is null or trim(get_json_object(track_info,'$.aaid'))='' then 1 else 0 end) /sum(1),4) as aaid_rate,
sum( case when get_json_object(track_info,'$.k') is null or trim(get_json_object(track_info,'$.k'))='' then 1 else 0 end) as k_null,
round(sum( case when get_json_object(track_info,'$.k') is null or trim(get_json_object(track_info,'$.k'))='' then 1 else 0 end) /sum(1),4) as k_rate,
sum( case when get_json_object(track_info,'$.source_from') is null or trim(get_json_object(track_info,'$.source_from'))='' then 1 else 0 end) as source_from_null,
round(sum( case when get_json_object(track_info,'$.source_from') is null or trim(get_json_object(track_info,'$.source_from'))='' then 1 else 0 end) /sum(1),4) as source_from_rate,
sum( case when get_json_object(track_info,'$.search_from') is null or trim(get_json_object(track_info,'$.search_from'))='' then 1 else 0 end) as search_from_null,
round(sum( case when get_json_object(track_info,'$.search_from') is null or trim(get_json_object(track_info,'$.search_from'))='' then 1 else 0 end) /sum(1),4) as search_from_rate
-- 以下device,app_version,在本地运行时为了查看详细数据,需要打开,在接入dqc时需要注释掉
-- ,device,app_version
from database.table_name
WHERE ds=to_char(dateadd(GETDATE(),-1,'dd'),"yyyymmdd")
-- 以下and hh in(18,19)在接入dqc时需要注释掉,监控线上一天24小时的数据。仅用于本地调试
and hh in(18,19)
-- WHERE ds=replace(split_part(dateadd(from_unixtime(unix_timestamp()),-1,'dd')," ",1),"-","")
and site='xx' and (device='android' or device='iphone' )
and app_version >= '8.3.0'
-- and (spm like '%a2h0c.8166622.rdirect%' OR spm like '%a2h0c.8166622.rmovie%')
-- and (original_spm like '%a2h0c.8166622.PhoneSokuTab%' OR original_spm like '%a2h0c.8166622.PhoneSokuCast%' OR original_spm like '%a2h0c.8166622.PhoneSokuPromote%')
and (original_spm like '%a2h0c.8166622.phonesokutab%' OR original_spm like '%a2h0c.8166622.phonesokucast%' OR original_spm like '%a2h0c.8166622.phonesokupromote%')
-- and (original_scm like '%20140669.search.filter%' OR original_scm like '%20140669.search.person%' OR original_scm like '%20140669.search.circle%')
-- and spm like '%a2h0c.8166622.rdirect%' and (original_spm REGEXP (.*a2h0c\.8166622\.(channeltab|portrait).*)
-- 以下GROUP BY device,app_version在本地运行时为了查看详细数据,需要打开,在接入dqc时需要注释掉
-- GROUP BY device,app_version
)
;
效果:各字段失败占比统计
--odps sql
--********************************************************************--
--author:姝昕
--create time:2019-12-24 16:13:23
--********************************************************************--
--以下SQL的as也可以去掉
-- //通用需求柏拉图必须存在的key的校验
-- public static String[] mustExistKeyOuter = {"spm", "scm"};
-- public static String[] mustExistKeyInner = {"soku_test_ab", "engine", "item_log", "aaid", "k", "source_from", "search_from"};
-- set odps.sql.type.system.odps2=true;
-- select date_format(CURRENT_TIMESTAMP(),"%Y%M%D") from dual;
select
-- 以下device,app_version,total_failrate在本地运行时为了查看详细数据,需要打开,在接入dqc时需要注释掉
-- device,app_version,
round(spm_rate +scm_rate +track_info_rate+ soku_test_ab_rate+ engine_rate+ item_log_rate+ aaid_rate+ k_rate+ source_from_rate+ search_from_rate ,4)
as total_failrate FROM (select
sum( case when spm is null or trim(spm)='' then 1 else 0 end) as spm_null, sum(1) as log_total,
round(sum( case when spm is null or trim(spm)='' then 1 else 0 end) /sum(1) ,4) as spm_rate,
sum( case when scm is null or trim(scm)='' then 1 else 0 end) as scm_null,
round(sum( case when scm is null or trim(scm)='' then 1 else 0 end) /sum(1),4) as scm_rate,
sum( case when track_info is null or trim(track_info)='' then 1 else 0 end) as track_info_null,
round(sum( case when track_info is null or trim(track_info)='' then 1 else 0 end) /sum(1),4) as track_info_rate,
sum(case when get_json_object(track_info,'$.soku_test_ab') is null or trim(get_json_object(track_info,'$.soku_test_ab'))='' then 1 else 0 end) as soku_test_ab_null,
round(sum( case when get_json_object(track_info,'$.soku_test_ab') is null or trim(get_json_object(track_info,'$.soku_test_ab'))='' then 1 else 0 end) /sum(1),4) as soku_test_ab_rate,
sum( case when get_json_object(track_info,'$.engine') is null or trim(get_json_object(track_info,'$.engine'))='' then 1 else 0 end) as engine_null,
round(sum( case when get_json_object(track_info,'$.engine') is null or trim(get_json_object(track_info,'$.engine'))='' then 1 else 0 end) /sum(1),4) as engine_rate,
sum( case when get_json_object(track_info,'$.item_log') is null or trim(get_json_object(track_info,'$.item_log'))='' then 1 else 0 end) as item_log_null,
round(sum( case when get_json_object(track_info,'$.item_log') is null or trim(get_json_object(track_info,'$.item_log'))=''then 1 else 0 end) /sum(1),4) item_log_rate,
sum( case when get_json_object(track_info,'$.aaid') is null or trim(get_json_object(track_info,'$.aaid'))='' then 1 else 0 end) as aaid_null,
round(sum( case when get_json_object(track_info,'$.aaid') is null or trim(get_json_object(track_info,'$.aaid'))='' then 1 else 0 end) /sum(1),4) as aaid_rate,
sum( case when get_json_object(track_info,'$.k') is null or trim(get_json_object(track_info,'$.k'))='' then 1 else 0 end) as k_null,
round(sum( case when get_json_object(track_info,'$.k') is null or trim(get_json_object(track_info,'$.k'))='' then 1 else 0 end) /sum(1),4) as k_rate,
sum( case when get_json_object(track_info,'$.source_from') is null or trim(get_json_object(track_info,'$.source_from'))='' then 1 else 0 end) as source_from_null,
round(sum( case when get_json_object(track_info,'$.source_from') is null or trim(get_json_object(track_info,'$.source_from'))='' then 1 else 0 end) /sum(1),4) as source_from_rate,
sum( case when get_json_object(track_info,'$.search_from') is null or trim(get_json_object(track_info,'$.search_from'))='' then 1 else 0 end) as search_from_null,
round(sum( case when get_json_object(track_info,'$.search_from') is null or trim(get_json_object(track_info,'$.search_from'))='' then 1 else 0 end) /sum(1),4) as search_from_rate
-- 以下device,app_version,在本地运行时为了查看详细数据,需要打开,在接入dqc时需要注释掉
-- ,device,app_version
from ytalgo_common.dwd_soku_wlapp_clk_h
WHERE ds=to_char(dateadd(GETDATE(),-1,'dd'),"yyyymmdd")
-- 以下and hh in(18,19)在接入dqc时需要注释掉,监控线上一天24小时的数据。仅用于本地调试
and hh in(18,19)
-- WHERE ds=replace(split_part(dateadd(from_unixtime(unix_timestamp()),-1,'dd')," ",1),"-","")
and site='youku' and (device='android' or device='iphone' )
and app_version >= '8.3.0'
-- and (spm like '%a2h0c.8166622.rdirect%' OR spm like '%a2h0c.8166622.rmovie%')
-- and (original_spm like '%a2h0c.8166622.PhoneSokuTab%' OR original_spm like '%a2h0c.8166622.PhoneSokuCast%' OR original_spm like '%a2h0c.8166622.PhoneSokuPromote%')
and (original_spm like '%a2h0c.8166622.phonesokutab%' OR original_spm like '%a2h0c.8166622.phonesokucast%' OR original_spm like '%a2h0c.8166622.phonesokupromote%')
-- and (original_scm like '%20140669.search.filter%' OR original_scm like '%20140669.search.person%' OR original_scm like '%20140669.search.circle%')
-- and spm like '%a2h0c.8166622.rdirect%' and (original_spm REGEXP (.*a2h0c\.8166622\.(channeltab|portrait).*)
-- 以下GROUP BY device,app_version在本地运行时为了查看详细数据,需要打开,在接入dqc时需要注释掉
-- GROUP BY device,app_version
)
;
效果:10个字段失败占比总和统计
来源:CSDN
作者:QA东方陨
链接:https://blog.csdn.net/weixin_42498050/article/details/103705987