问题
Performing a select on Clickhouse, on a MergeTree table that is loaded from a KafkaEngine table via a Materialized View, a simple select shows output split in groups in the clickhouse-client
:
:) select * from customersVisitors;
SELECT * FROM customersVisitors
┌────────day─┬─────────createdAt───┬──────────────────_id─┬───────────mSId─┬───────xId──┬─yId─┐
│ 2018-08-17 │ 2018-08-17 11:42:04 │ 8761310857292948227 │ DV-1811114459 │ 846817 │ 0 │
│ 2018-08-17 │ 2018-08-17 11:42:04 │ 11444873433837702032 │ DV-2164132903 │ 780066 │ 0 │
└────────────┴─────────────────────┴──────────────────────┴────────────────┴────────────┴─────┘
┌────────day─┬─────────createdAt───┬──────────────────_id─┬───────────────────mSId──┬────────xId─┬─yId─┐
│ 2018-08-17 │ 2018-08-17 10:25:11 │ 14403835623731794748 │ DV-07680633204819271839 │ 307597 │ 0 │
└────────────┴─────────────────────┴──────────────────────┴─────────────────────────┴────────────┴─────┘
3 rows in set. Elapsed: 0.013 sec.
Engine is ENGINE = MergeTree(day, (mSId, xId, day), 8192)
Why does the output appear splitted in two groups?
回答1:
If I'm not mistaken, the output is split when the data came from different blocks, also often it leads to being processed in different threads. If you want to get rid of it, wrap your query in outer select
select * from (...)
回答2:
MergeTree Engine is designed for faster WRITE and READ operations.
Fater writes are achieved by inserting data in parts and then the data is merged offline into a single part for faster reads.
you can see the data partition the following directory :
ls /var/lib/clickhouse/data/database_name/table_name
If you run the following query, you will find this that the data is now available in a single group and also a new partition is available at the above location :
optimize table MY_TABLE_NAME
Optimize table forces merging of partition, but in usual cases, you can just leave it on Click house .
来源:https://stackoverflow.com/questions/51899472/clickhouse-split-output-on-select