问题
This question follows this one.
The main task is to make joins on KSQL side. Example below will illustrate it. Incidents messages arrive In Kafka topic. The structure of that messages:
[
{
"name": "from_ts",
"type": "bigint"
},
{
"name": "to_ts",
"type": "bigint"
},
{
"name": "rulenode_id",
"type": "int"
}
]
And there is a Postgres table rulenode
:
id | name | description
Data from both sources need to be joined by fields rulenode_id = rulenode.id
so as to get single record with fields from_ts, to_ts, rulenode_id, rulenode_name, rulenode_description
.
I want to do this by means of KSQL but not backend as it is now.
Right now data from Postgres table transferred to Kafka by JdbcSourceConnector. But there is one little problem - as you could guess data in Postgres table may be changed. And of course these changes should be in KSQL stream OR table too.
Below I've been asked why KTable and not Kstream. Well, please, visit this page and look at the first GIF. There records of table are being updated when new data arrive. I thought such behaviour is what I need (where instead of names Alice, Bob I have primary key id
of Postgres table rulenode
). That's why I chose KTable.
Bulk mode of JdbcSourceConnect copies all of the table. And as you know all rows arrive into Kafka table to previous Postgres table snapshots.
As suggested I created a connector with configs:
{
"name": "from-pg",
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"errors.log.enable": "true",
"connection.url": "connection.url",
"connection.user": "postgres",
"connection.password": "*************",
"table.whitelist": "rulenode",
"mode": "bulk",
"poll.interval.ms": "5000",
"topic.prefix": "pg."
}
Then created a stream:
create stream rulenodes
with (kafka_topic='pg.rules_rulenode', value_format='avro', key='id');
and now trying to create a table:
create table rulenodes_unique
as select * from rulenodes;
but that didn't work with error:
Invalid result type. Your SELECT query produces a STREAM. Please use CREATE STREAM AS SELECT statement instead.
I read that tables are used when to store aggregated info. For example to store aggregated with COUNT function:
create table rulenodes_unique
as select id, count(*) from rulenodes order by id;
Can you say please how to handle that error?
回答1:
You can create a STREAM
or a TABLE
on top of a Kafka topic with ksqlDB - it's to do with how you want to model the data. From your question it is clear that you need to model it as a table (because you want to join to the latest version of a key). So you need to do this:
create table rulenodes
with (kafka_topic='pg.rules_rulenode', value_format='avro');
Now there is one more thing you have to do, which is ensure that the data in your topic is correctly keyed. You cannot specify key='id'
and it automagically happen - the key
parameter is just a 'hint'. You must make sure that the messages in the Kafka topic have the id
field in the key. See ref doc for full details.
You can do this with a Single Message Transform in Kafka Connect:
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
Or you can do it in ksqlDB and change the key - and because we want to process every event we first model it as a stream (!) and the declare the table over the re-keyed topic:
create stream rulenodes_source
with (kafka_topic='pg.rules_rulenode', value_format='avro');
CREATE STREAM RULENODES_REKEY AS SELECT * FROM rulenodes_source PARITION BY id;
CREATE TABLE rulenodes WITH (kafka_topic='RULENODES_REKEY', value_format='avro');
I would go the Single Message Transform route because it is neater and simpler overall.
回答2:
It's not clear which statement throws the error, but it's misleading if on the table definition
You can create tables from topics directly. No need to go through a stream
https://docs.confluent.io/current/ksql/docs/developer-guide/create-a-table.html
If you want to use the stream as well, as the docs say
Use the
CREATE TABLE AS SELECT
statement to create a table with query results from an existing table or stream.
You may want to use case sensitive values in the statements
CREATE STREAM rulenodes WITH (
KAFKA_TOPIC ='pg.rules_rulenode',
VALUE_FORMAT='AVRO',
KEY='id'
);
CREATE TABLE rulenodes_unique AS
SELECT id, COUNT(*) FROM rulenodes
ORDER BY id;
来源:https://stackoverflow.com/questions/60491786/from-postgres-to-kafka-with-changes-tracking