Read pipe separated values in ksql

安稳与你 提交于 2020-03-25 18:49:46

问题


I am working on POC, I have to read pipe separated value file and insert these records into ms sql server. I am using confluent 5.4.1 to use value_delimiter create stream property. But its giving exception: Delimeter only supported with DELIMITED format

1. Start Confluent (version: 5.4.1)::

[Dev root @ myip ~]
# confluent local start
    The local commands are intended for a single-node development environment
    only, NOT for production usage. https://docs.confluent.io/current/cli/index.html

Using CONFLUENT_CURRENT: /tmp/confluent.vHhSRAnj
Starting zookeeper
zookeeper is [UP]
Starting kafka
kafka is [UP]
Starting schema-registry
schema-registry is [UP]
Starting kafka-rest
kafka-rest is [UP]
Starting connect
connect is [UP]
Starting ksql-server
ksql-server is [UP]
Starting control-center
control-center is [UP]
[Dev root @ myip ~]
# jps
49923 KafkaRestMain
50099 ConnectDistributed
49301 QuorumPeerMain
50805 KsqlServerMain
49414 SupportedKafka
52103 Jps
51020 ControlCenter
1741
49646 SchemaRegistryMain
[Dev root @ myip ~]
#

2. Create Topic:

[Dev root @ myip ~]
# kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic SampleData
Created topic SampleData.

3. Provide pipe separated data to SampeData Topic

[Dev root @ myip ~]
# kafka-console-producer --broker-list localhost:9092 --topic SampleData <<EOF
> this is col1|and now col2|and col 3 :)
> EOF
>>[Dev root @ myip ~]
#

4. Start KSQL::

[Dev root @ myip ~]
# ksql

                  ===========================================
                  =        _  __ _____  ____  _             =
                  =       | |/ // ____|/ __ \| |            =
                  =       | ' /| (___ | |  | | |            =
                  =       |  <  \___ \| |  | | |            =
                  =       | . \ ____) | |__| | |____        =
                  =       |_|\_\_____/ \___\_\______|       =
                  =                                         =
                  =  Streaming SQL Engine for Apache Kafka® =
                  ===========================================

Copyright 2017-2019 Confluent Inc.

CLI v5.4.1, Server v5.4.1 located at http://localhost:8088

Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!

5. Declare a schema for the existing topic: SampleData

ksql> CREATE STREAM sample_delimited (
>       column1 varchar(1000),
>       column2 varchar(1000),
>       column3 varchar(1000))
>       WITH (KAFKA_TOPIC='SampleData', VALUE_FORMAT='DELIMITED', VALUE_DELIMITER='|');

 Message
----------------
 Stream created
----------------

6. Verify data into KSQl Stream


ksql>  SET 'auto.offset.reset' = 'earliest';
Successfully changed local property 'auto.offset.reset' to 'earliest'. Use the UNSET command to revert your change.
ksql> SELECT * FROM sample_delimited emit changes limit 1;
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
|ROWTIME                    |ROWKEY                     |COLUMN1                    |COLUMN2                    |COLUMN3                    |
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
|1584339233947              |null                       |this is col1               |and now col2               |and col 3 :)               |
Limit Reached
Query terminated

7. Write a new Kafka topic: SampleDataAvro that serializes all the data from sample_delimited stream to Avro format stream

ksql> CREATE STREAM sample_avro WITH (KAFKA_TOPIC='SampleDataAvro', VALUE_FORMAT='AVRO') AS SELECT * FROM sample_delimited;
Delimeter only supported with DELIMITED format
ksql>

8. Above line gives exception::

Delimeter only supported with DELIMITED format

9. Load ms sql kafka connect configuration

confluent local load test-sink -- -d ./etc/kafka-connect-jdbc/sink-quickstart-mssql.properties


回答1:


The only time you need to specify the delimiter is when you define the stream that is reading from the source topic.

Here's my worked example:

  1. Populate a topic with pipe-delimited data:

    $ kafkacat -b localhost:9092 -t SampleData -P<<EOF
    this is col1|and now col2|and col 3 :)
    EOF
    
  2. Declare a stream over it

    CREATE STREAM sample_delimited ( 
            column1 varchar(1000),
            column2 varchar(1000),
            column3 varchar(1000)) 
            WITH (KAFKA_TOPIC='SampleData', VALUE_FORMAT='DELIMITED', VALUE_DELIMITER='|');
    
  3. Query the stream to make sure it works

    ksql> SET 'auto.offset.reset' = 'earliest';
    Successfully changed local property 'auto.offset.reset' to 'earliest'. Use the UNSET command to revert your change.
    
    ksql> SELECT * FROM sample_delimited emit changes limit 1;
    +----------------+--------+---------------+--------------+--------------+
    |ROWTIME         |ROWKEY  |COLUMN1        |COLUMN2       |COLUMN3       |
    +----------------+--------+---------------+--------------+--------------+
    |1583933584658   |null    |this is col1   |and now col2  |and col 3 :)  |
    Limit Reached
    Query terminated
    
  4. Reserialise the data to Avro:

    CREATE STREAM sample_avro WITH (KAFKA_TOPIC='SampleDataAvro', VALUE_FORMAT='AVRO') AS SELECT * FROM sample_delimited;
    
  5. Dump the contents of the topic - note that it is now Avro:

    ksql> print SampleDataAvro;
    Key format: UNDEFINED
    Value format: AVRO
    rowtime: 3/11/20 1:33:04 PM UTC, key: <null>, value: {"COLUMN1": "this is col1", "COLUMN2": "and now col2", "COLUMN3": "and col 3 :)"}
    

The error that you're hitting is as a result of bug #4200. You can wait for the next release of Confluent Platform, or use standalone ksqlDB in which the issue is already fixed.

Here's using ksqlDB 0.7.1 streaming the data to MS SQL:

CREATE SINK CONNECTOR SINK_MSSQL WITH (
    'connector.class'     = 'io.confluent.connect.jdbc.JdbcSinkConnector',
    'connection.url'      = 'jdbc:sqlserver://mssql:1433',
    'connection.user'     = 'sa',
    'connection.password' = 'Admin123',
    'topics'              = 'SampleDataAvro',
    'key.converter'       = 'org.apache.kafka.connect.storage.StringConverter',
    'auto.create'         = 'true',
    'insert.mode'         = 'insert'
  );

Now query the data in MS SQL

1> Select @@version
2> go

---------------------------------------------------------------------
Microsoft SQL Server 2017 (RTM-CU17) (KB4515579) - 14.0.3238.1 (X64)
        Sep 13 2019 15:49:57
        Copyright (C) 2017 Microsoft Corporation
        Developer Edition (64-bit) on Linux (Ubuntu 16.04.6 LTS)

(1 rows affected)
1> SELECT * FROM SampleDataAvro;
2> GO
COLUMN3        COLUMN2         COLUMN1     
-------------- --------------- ------------------
and col 3 :)   and now col2    this is col1

(1 rows affected)


来源:https://stackoverflow.com/questions/60631130/read-pipe-separated-values-in-ksql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!