Array type in clickhouseIO for apache beam(dataflow)

江枫思渺然 提交于 2020-05-17 07:55:26

问题


I am using Apache Beam to consume json and insert into clickhouse.

I am currently having a problem with the Array data type.

Everything works fine before I add an array type of field

Schema.Field.of("inputs.value", Schema.FieldType.array(Schema.FieldType.INT64).withNullable(true))

Code for the transformations

p.apply(transformNameSuffix + "ReadFromPubSub",
        PubsubIO.readStrings().fromSubscription(chainConfig.getPubSubSubscriptionPrefix() + "transactions").withIdAttribute(PUBSUB_ID_ATTRIBUTE))
        .apply(transformNameSuffix + "ReadFromPubSub", ParDo.of(new DoFn<String, Row>() {
            @ProcessElement
            public void processElement(ProcessContext c) {
                String item = c.element();

                //System.out.print(item);
                Transaction transaction = JsonUtils.parseJson(item, Transaction.class);
                c.output(Row.withSchema(Schemas.TRANSACTIONS)
                        .addValues(*****,
                                   *****
                                   .......

                transaction.getInputValues()).build());}

        })).setRowSchema(Schemas.TRANSACTIONS).apply(
        ClickHouseIO.<Row>write(
                chainConfig.getClickhouseJDBCURI(),
                chainConfig.getTransactionsTable())
                .withMaxRetries(3)
                .withMaxInsertBlockSize(1)
                .withInitialBackoff(Duration.standardSeconds(5))
                .withInsertDeduplicate(true)
                .withInsertDistributedSync(false));

The method that generates the inputs

public List<Long> getInputValues() {
    List<Long> values = Lists.newArrayList();

    for (TransactionInput eachInput : inputs) {
        System.out.print(eachInput.getValue());
        values.add(eachInput.getValue());
    }

    return values;
}

The error I am getting is :

ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 33, host: 35.202.46.77, port: 8123; Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 6. Bytes expected: 15. (version 19.17.4.11 (official build))

    at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:58)
    at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:28)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.checkForErrorAndThrow(ClickHouseStatementImpl.java:875)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:851)
    at ru.yandex.clickhouse.Writer.send(Writer.java:106)
    at ru.yandex.clickhouse.Writer.send(Writer.java:141)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.sendRowBinaryStream(ClickHouseStatementImpl.java:764)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.sendRowBinaryStream(ClickHouseStatementImpl.java:758)
    at org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn.flush(ClickHouseIO.java:427)
    at org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn.processElement(ClickHouseIO.java:411)
    at org.apache.beam.sdk.io.clickhouse.AutoValue_ClickHouseIO_WriteFn$DoFnInvoker.invokeProcessElement(Unknown Source)
    at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:222)
    at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:183)
    at org.apache.beam.repackaged.direct_java.runners.core.SimplePushbackSideInputDoFnRunner.processElementInReadyWindows(SimplePushbackSideInputDoFnRunner.java:78)
    at org.apache.beam.runners.direct.ParDoEvaluator.processElement(ParDoEvaluator.java:216)
    at org.apache.beam.runners.direct.DoFnLifecycleManagerRemovingTransformEvaluator.processElement(DoFnLifecycleManagerRemovingTransformEvaluator.java:54)
    at org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
    at org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Throwable: Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 6. Bytes expected: 15. (version 19.17.4.11 (official build))

    at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:53)
    ... 22 more

Feb 06, 2020 9:04:38 PM org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn flush
WARNING: Error writing to ClickHouse. Retry attempt[1]
ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 33, host: 35.202.46.77, port: 8123; Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 6. Bytes expected: 93. (version 19.17.4.11 (official build))

    at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:58)
    at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:28)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.checkForErrorAndThrow(ClickHouseStatementImpl.java:875)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:851)
    at ru.yandex.clickhouse.Writer.send(Writer.java:106)
    at ru.yandex.clickhouse.Writer.send(Writer.java:141)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.sendRowBinaryStream(ClickHouseStatementImpl.java:764)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.sendRowBinaryStream(ClickHouseStatementImpl.java:758)
    at org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn.flush(ClickHouseIO.java:427)
    at org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn.processElement(ClickHouseIO.java:411)
    at org.apache.beam.sdk.io.clickhouse.AutoValue_ClickHouseIO_WriteFn$DoFnInvoker.invokeProcessElement(Unknown Source)
    at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:222)
    at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:183)
    at org.apache.beam.repackaged.direct_java.runners.core.SimplePushbackSideInputDoFnRunner.processElementInReadyWindows(SimplePushbackSideInputDoFnRunner.java:78)
    at org.apache.beam.runners.direct.ParDoEvaluator.processElement(ParDoEvaluator.java:216)
    at org.apache.beam.runners.direct.DoFnLifecycleManagerRemovingTransformEvaluator.processElement(DoFnLifecycleManagerRemovingTransformEvaluator.java:54)
    at org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
    at org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Throwable: Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 6. Bytes expected: 93. (version 19.17.4.11 (official build))

    at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:53)
    ... 22 more

Feb 06, 2020 9:04:39 PM org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn flush
WARNING: Error writing to ClickHouse. Retry attempt[1]
ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 33, host: 35.202.46.77, port: 8123; Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 5. Bytes expected: 2641. (version 19.17.4.11 (official build)

Clikhouse schema:

CREATE TABLE IF NOT EXISTS transactions_streaming_small ( 
  *****, 
  *****, 
  inputs Nested ( value Nullable(UInt64) ) ) 
ENGINE = MergeTree() PARTITION BY toYYYYMM(block_date_time)

What is the problem?

[ClickHouse version 19.17.4.11 (official build)]

来源:https://stackoverflow.com/questions/60098740/array-type-in-clickhouseio-for-apache-beamdataflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!