Is KSQL making remote requests under the hood, or is a Table actually a global KTable?

问题

I have a Kafka topic containing customer records, called "customer-created". Each customer is a new record in the topic. There are 4 partitions.

I have two ksql-server instances running, based on the docker image confluentinc/cp-ksql-server:5.3.0. Both use the same KSQL Service Id.

I've created a table:

CREATE TABLE t_customer (id VARCHAR, 
                         firstname VARCHAR, 
                         lastname VARCHAR)
WITH (KAFKA_TOPIC = 'customer-created', 
      VALUE_FORMAT='JSON', 
      KEY = 'id');

I'm new to KSQL, but my understanding was that KSQL builds on top of Kafka Streams and that each ksql-server instance is roughly equivalent to a Kafka streams application instance. The first thing I notice is that as soon as I start a new instance of the ksql-server, it already knows about the tables/streams created on the first instance, even though it is an interactive instance in developer mode. Second of all, I can select the same customer based on it's ID from both instances, but I expected to only be able to do that from one of the instances, because I assumed a KSQL Table is equivalent to a KTable, i.e. it should only contain local data, i.e. from the partitions being processed by the ksql-server instance.

SET 'auto.offset.reset'='earliest';
select * from t_customer where id = '7e1a141b-b8a6-4f4a-b368-45da2a9e92a1';

Regardless of which instance of the ksql-server I attach the ksql-cli to, I get a result. The only way that I can get this to work when using plain Kafka Streams, is to use a global KTable. The fact that I get the result from both instances surprised me a little because according to the docs, "Only the Kafka Streams DSL has the notion of a GlobalKTable", so I expected only one of the two instances to find the customer. I haven't found any docs anywhere that explain how to specify that a KSQL Table should be a local or global table.

So here is my question: is a KSQL Table the equivalent of a global KTable and the docs are misleading, or is the ksql-server instance that I am connected to, making a remote request under the hood, to the instance responsible for the ID (presumably based on the partition), as described here, for Kafka Streams?

回答1:

KSQL does not support GlobalKTables atm.

Your analogy between a KSQL server and a Kafka Streams program is not 100% accurate though. Each query is a Kafka Streams program (note, that a "program" can have multiple instances). Also, there is a difference between persistent queries and transient queries. When you create a TABLE from a topic, the command itself is a metadata operation only (similar for CREATE STREAM from a topic). For both, no query is executed and no Kafka Streams program is started.

The information about all creates STREAMS and TABLES is stored in a shared "command topic" in the Kafka Cluster. All servers with the same ID receive the same information about created streams, tables.

Queries run in the CLI are transient queries and they will be executed by a single server. The information about such transient queries is not distributed to other servers. Basically, a unique query-id (ie, application.id) is generated and the servers runs a single instance KafakStreams program. Hence, the server/program will subscribe to all partitions.

A persistent query (ie, CREATE STREAM AS or CREATE TABLE AS) is a query that queries a STREAM or TABLE and produces a STREAM or TABLE as output. The information about persistent queries is distributed via the "command topic" to all servers (however, not all servers will execute all persistent queries -- it depends on the configured parallelism how many will execute it). For persistent queries, each server that participates to execute the query creates a KafkaStreams instance running the same program, and all will use the same query-Id (ie, application.id) and thus different servers will subscribe to different topics.

来源：https://stackoverflow.com/questions/57333738/is-ksql-making-remote-requests-under-the-hood-or-is-a-table-actually-a-global-k

标签

apache-kafka-streams

ksql