ClickHouse | 易学教程

「3306π」北京站，一场用心的MySQL DBA交流会

阅读更多关于「3306π」北京站，一场用心的MySQL DBA交流会

「3306π」北京站活动亮点 1. 新形式：上午小班化培训，下午专题技术分享。 2. 新热点：MySQL 8.0、MGR、PXC相关应用的坑、MySQL和ClickHouse结合搞定OLAP需求、MySQL中间件不断优化之路。 3. 新机会：3306π诚邀近40位行业大佬参会，形成以MySQL为主的北京技术圈子，促进新人脉关系建立，助力职业发展及技能提升。 4. 新惊喜：会间抽奖环节、会末大奖，奖品个个令人期待。大奖和书籍奖品正在积极筹备中，欢迎后台留言心中期待的大奖奖品和喜欢的书目，惊喜现场送达。时间和地点 3月23日 9:00-18:20 北京海淀区中关村大街46号院1-4单元楼东侧阿里云创新中心（地铁4号线人民大学站A2口）活动日程安排报名参与 1、票务选择【普票】参与下午技术分享活动：10元【VIP票】参加下午技术分享活动+晚宴交流：888元（VIP票需审核，如果有意向可回复本公众号咨询） 2、报名方式识别下方二维码，或点击【阅读原文】进行报名。 3、转发本活动页面到朋友圈集赞30个可以获取下午票优惠码。主办方协办方赞助商合作伙伴关注我们本文分享自微信公众号 - 老叶茶馆（iMySQL_WX）。如有侵权，请联系 support@oschina.cn 删除。本文参与“ OSC源创计划 ”，欢迎正在阅读的你也加入，一起分享。来源：

Flink 如何读取和写入 Clickhouse？

阅读更多关于 Flink 如何读取和写入 Clickhouse？

本文将主要介绍 FlinkX 读取和写入 Clickhouse 的过程及相关参数，核心内容将围绕以下3个问题，公众号后台回复“X”即可下载插件。 FlinkX读写Clickhouse支持哪个版本？ ClickHouse读写Clickhouse有哪些参数？ ClickHouse读写Clickhouse参数都有哪些说明？ ClickHouse 读取一、插件名称名称：clickhousereader 二、支持的数据源版本 ClickHouse 19.x及以上三、参数说明「jdbcUrl」描述：针对关系型数据库的jdbc连接字符串 jdbcUrl参考文档：clickhouse-jdbc官方文档必选：是默认值：无「username」描述：数据源的用户名必选：是默认值：无「password」描述：数据源指定用户名的密码必选：是默认值：无「where」描述：筛选条件，reader插件根据指定的column、table、where条件拼接SQL，并根据这个SQL进行数据抽取。在实际业务场景中，往往会选择当天的数据进行同步，可以将where条件指定为gmt_create > time。注意：不可以将where条件指定为limit 10，limit不是SQL的合法where子句。必选：否默认值：无「splitPk」描述

用户留存分析案例 | 以京东、淘宝、饿了么为例！

阅读更多关于用户留存分析案例 | 以京东、淘宝、饿了么为例！

我们把完成激活并在一段时间内继续进行使用、浏览、或者购买等关键行为的用户叫做留存用户。我们在做用户增长往往重视了获客，而容易忽略留存。就算我们获取的用户很多，如果留存差，当流失的用户大于获取用户数，我们的用户就会越来越少，用户增长就无法持续。就像一个池子，只有当进水口的水大，而出水口小，水才会在池子里越来越多。用户留存要做的就是想办法堵住池子里的流水口。尤其是随着流量流量红利的消失，用户的获取变得越来越难，成本越来越高。在流量红利消失时代，我们更应该将更多的精力放在用户的留存上。如何才能提升留存呢？明确用户留存的标准我们可以看到以上三条用户留存曲线，第一第二条曲线经过用户的流失后，慢慢变得平缓，而最下面一条曲线，用户一直处于下滑状态，直到用户流失趋于0，也就是用户池子里的水基本上全部漏掉了，这样的留存是很差的。留存曲线好的留存应该是用户在经过一段时间的下滑后，慢慢变得平缓，而且曲线变平的位置越高，说明我们的留存越好。比如前面两条曲线，第一条曲线大概在60%左右时候变得平缓，第二条大概在40%时候才变得平缓，我们在做用户留存的时候就是要想办法将变平缓的曲线位置提高。 1. 关注留存率我们在衡量留存好坏是一定不要仅仅关注留存用户数，留存用户数有价值，但更有价值的是留存率。如果我们仅仅关注留存用户数容易造成我们的误区，比如这次活动的带来的用户留存是100万

ClickHouse: Usage of hash and internal_replication in Distributed & Replicated tables

阅读更多关于 ClickHouse: Usage of hash and internal_replication in Distributed & Replicated tables

问题 I have read this in the Distributed Engine documentation about internal_replication setting. If this parameter is set to ‘true’, the write operation selects the first healthy replica and writes data to it. Use this alternative if the Distributed table “looks at” replicated tables. In other words, if the table where data will be written is going to replicate them itself. If it is set to ‘false’ (the default), data is written to all replicas. In essence, this means that the Distributed table

Restrict User Access Rights In ClickHouse

阅读更多关于 Restrict User Access Rights In ClickHouse

问题 I have created multiple Databases in Clickhouse and a new User, and now can I restrict that newly created user to be able to access a particular database. 回答1: In users.xml in 'user' (near profile, quota...) you could specify optional section <allow_databases> <database>default</database> <database>test</database> </allow_databases> If there is no 'allow_databases' section - it means that access to all databases is allowed. Access to database 'system' is always allowed (because system

insert using pandas to_sql() missing data into clickhouse db

阅读更多关于 insert using pandas to_sql() missing data into clickhouse db

问题 It's my first time using sqlalchemy and pandas to insert some data into a clickhouse db. When I try to insert some data using clickhouse cli it works fine, but when I tried to do the same thing using sqlalchemy I don't know why one row is missing. Have I done something wrong? import pandas as pd # created the dataframe engine = create_engine(uri) session = make_session(engine) metadata = MetaData(bind=engine) metadata.reflect(bind = engine) conn = engine.connect() df.to_sql('test', conn, if

PostgreSQL使用clickhousedb_fdw访问ClickHouse

阅读更多关于 PostgreSQL使用clickhousedb_fdw访问ClickHouse

作者：杨杰简介 PostgreSQL FDW是一种外部访问接口，它可以被用来访问存储在外部的数据，这些数据可以是外部的PG数据库，也可以mysql、ClickHouse等数据库。 ClickHouse是一款快速的开源OLAP数据库管理系统，它是面向列的，允许使用SQL查询实时生成分析报告。 clickhouse_fdw是一个开源的外部数据包装器(FDW)用于访问ClickHouse列存数据库。目前有以下两款clickhouse_fdw： https://github.com/adjust/clickhouse_fdw 一直持续不断的有提交，目前支持PostgreSQL 11-13 https://github.com/Percona-Lab/clickhousedb_fdw 之前有一年时间没有动静，最近一段时间刚从adjust/clickhouse_fdw merge了一下，目前也支持PostgreSQL 11-13。本文就以adjust/clickhouse_fdw为例。安装 # libcurl >= 7.43.0 yum install libcurl-devel libuuid-devel git clone https://github.com/adjust/clickhouse_fdw.git cd clickhouse_fdw mkdir build && cd

Clickhouse: split output on select

阅读更多关于 Clickhouse: split output on select

问题 Performing a select on Clickhouse, on a MergeTree table that is loaded from a KafkaEngine table via a Materialized View, a simple select shows output split in groups in the clickhouse-client : :) select * from customersVisitors; SELECT * FROM customersVisitors ┌────────day─┬─────────createdAt───┬──────────────────_id─┬───────────mSId─┬───────xId──┬─yId─┐ │ 2018-08-17 │ 2018-08-17 11:42:04 │ 8761310857292948227 │ DV-1811114459 │ 846817 │ 0 │ │ 2018-08-17 │ 2018-08-17 11:42:04 │

HOW to SELECT data basing on both a period of date and a period of time in clickhouse

阅读更多关于 HOW to SELECT data basing on both a period of date and a period of time in clickhouse

问题 I want to filter some data by both yyyymmdd (date) and hhmmss (time), but clickhouse don't support time type. So I choose datetime to combine them. But how to do such things: This is code of dolphindb (which supports second type to represent hhmmss . select avg(ofr + bid) / 2.0 as avg_price from taq where date between 2007.08.05 : 2007.08.07, time between 09:30:00 : 16:00:00 group by symbol, date This is code of clickhouse , but a logical problematic code. SELECT avg(ofr + bid) / 2.0 AS avg

HOW to SELECT data basing on both a period of date and a period of time in clickhouse

阅读更多关于 HOW to SELECT data basing on both a period of date and a period of time in clickhouse

订阅 ClickHouse