Apache Pulsar IO之CDC Debezium Connector (SQLServer)

*爱你&永不变心* 提交于 2019-12-08 11:52:12

启动Pulsar

  1. 从官网下载tar.gz安装包

下载地址:https://archive.apache.org/dist/pulsar/pulsar-2.4.0/apache-pulsar-2.4.0-bin.tar.gz

$ wget https://archive.apache.org/dist/pulsar/pulsar-2.4.0/apache-pulsar-2.4.0-bin.tar.gz
  1. 解压
$ tar zxvf apache-pulsar-2.4.0-bin.tar.gz
  1. 创建connectors文件夹
$ cd apache-pulsar-2.4.0
$ mkdir connectors
  1. 启动pulsar
$ bin/pulsar standalone

OR

$ bin/pulsar standalone -nss

创建pulsar-io-debezium-sqlserver-1.0.nar文件

创建maven项目

添加依赖

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.apache.pulsar</groupId>
        <artifactId>pulsar-io</artifactId>
        <version>2.4.0</version>
    </parent>

    <artifactId>pulsar-sqlserver</artifactId>
    <name>Pulsar</name>

    <dependencies>
        <dependency>
            <groupId>${project.groupId}</groupId>
            <artifactId>pulsar-io-debezium-core</artifactId>
            <version>${project.version}</version>
        </dependency>

        <dependency>
            <groupId>io.debezium</groupId>
            <artifactId>debezium-connector-sqlserver</artifactId>
            <version>${debezium.version}</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.nifi</groupId>
                <artifactId>nifi-nar-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

编写DebeziumSqlServerSource类

public class DebeziumSqlServerSource extends DebeziumSource {

    static private final String DEFAULT_TASK = "io.debezium.connector.sqlserver.SqlServerConnectorTask";


    public void setDbConnectorTask(Map<String, Object> config) throws Exception {
        throwExceptionIfConfigNotMatch(config, TaskConfig.TASK_CLASS_CONFIG, DEFAULT_TASK);
    }
}

编写debezium-sqlserver-source-config.yaml文件

  • 文件内容
tenant: "test"
namespace: "test-namespace"
name: "debezium-sqlserver-source"
topicName: "debezium-sqlserver-topic"
archive: "connectors/pulsar-io-debezium-sqlserver-1.0-SNAPSHOT.nar"

parallelism: 1

configs:
  database.hostname: "localhost"
  database.port: "1433"
  database.user: "sa"
  database.password: "Password!"
  database.dbname: "testDB"
  "database.server.name": "fullfillment"
  table.whitelist: "dbo.customers"

  pulsar.service.url: "pulsar://127.0.0.1:6650"
  database.history.pulsar.topic: "sqlserver-history-topic"
  offset.storage.topic: "sqlserver-offset-topic"
  • 文件存放路径
    在这里插入图片描述

编写pulsar-io.yaml文件

  • 文件内容
name: debezium-sqlserver
description: Debezium SqlServer Source
sourceClass: com.pulsar.io.debezium.sqlserver.DebeziumSqlServerSource
  • 文件存放路径
    在这里插入图片描述

创建pulsar-io-debezium-sqlserver-1.0.nar文件

  • 输入maven打包命令:
mvn clean install

文件打包完成后存放在target目录下。

上传文件至服务器

使用远程连接工具(Xftp或WinSCP)将打包好的nar文件上传至Apache Pulsar安装目录下的connectors目录下。

启动Debezium Connector

编写debezium-sqlserver-source-config.yaml文件

  • 文件内容
tenant: "public"
namespace: "default"
name: "debezium-sqlserver-source"
topicName: "debezium-sqlserver-topic"
archive: "connectors/pulsar-io-debezium-sqlserver-1.0-SNAPSHOT.nar" # nar文件路径

parallelism: 1

configs:
  database.hostname: "127.0.0.1" # 数据库地址
  database.port: "1433" # 数据库端口
  database.user: "username" # 用户名
  database.password: "password" # 密码
  database.server.name: "servername" # 服务名称,可以任意命名
  database.dbname: "dbname" # 需要捕获的数据库名城
  table.whitelist: "dbo.cm_logs" # 需要捕获的表名称,schemaName.tableName

  database.history: "org.apache.pulsar.io.debezium.PulsarDatabaseHistory"
  database.history.pulsar.topic: "mssqlserver6-history-topic"
  database.history.pulsar.service.url: "pulsar://127.0.0.1:6650"
  ## KEY_CONVERTER_CLASS_CONFIG, VALUE_CONVERTER_CLASS_CONFIG
  key.converter: "org.apache.kafka.connect.json.JsonConverter"
  value.converter: "org.apache.kafka.connect.json.JsonConverter"
  ## PULSAR_SERVICE_URL_CONFIG
  pulsar.service.url: "pulsar://127.0.0.1:6650"
  ## OFFSET_STORAGE_TOPIC_CONFIG
  offset.storage.topic: "offset-topic"

configs详细配置信息请参考:Debezium Connector for SQL Server

  • 文件存放路劲
    在这里插入图片描述

启动SQLServer Source 并订阅Topic

启动SQLServer Source

在服务器中输入:

$ bin/pulsar-admin source localrun  --source-config-file debezium-mysql-source-config.yaml

订阅Topic

PulsarClient client = PulsarClient.builder().serviceUrl("pulsar://127.0.0.1:6650").build();
        Consumer consumer = client.newConsumer(Schema.KeyValue(Schema.BYTES, Schema.BYTES))
                .topic("public/default/bigdata.dbo.cm_logs")
                .subscriptionName("sub-products")
                .subscribe();
        while (true) {
            try {
                Message msg = consumer.receive();
                KeyValue<byte[], byte[]> keyValues =  Schema.KeyValue(Schema.BYTES, Schema.BYTES).decode(msg.getData());
                JSONSchema<Map> jsonSchema = JSONSchema.of(SchemaDefinition.<Map>builder().withPojo(Map.class).build());
                Map keyResult = jsonSchema.decode(keyValues.getKey());
                Map valueResult = jsonSchema.decode(keyValues.getValue());
                // TODO("get data")
                System.out.println(keyResult);
                System.out.println(valueResult);
                consumer.acknowledge(msg);
            } catch (Exception e) {
                consumer.close();
                e.printStackTrace();
            }
        }

问题回顾

  • 问题一
    在这里插入图片描述
    解决方法
    开启SQLServer的CDC。(2008及之后的版本需要企业版)
    SQL代码如下:
USE [databaseName]
GO
EXEC sys.sp_cdc_enable_db
GO
-- 以下操作为需要捕获哪张表的数据就将表添加到CDC的作业任务中
USE [databaseName]
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'tableName',
@role_name = N'roleName'
GO

详细配置信息请参考:sys.sp_cdc_enable_table (Transact-SQL)

参考文档

Apache Pulsar:http://pulsar.apache.org/docs/en/io-cdc-debezium/
Apache Pulsar Git: https://github.com/apache/pulsar/tree/master/pulsar-io/debezium/mysql
debezium:https://debezium.io/docs/connectors/sqlserver/#setting-up-sqlserver
Microsoft:https://docs.microsoft.com/zh-cn/sql/relational-databases/system-stored-procedures/sys-sp-cdc-enable-table-transact-sql?view=sql-server-2017

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!