Data is duplicated when I create a flattened stream

人盡茶涼 提交于 2020-01-25 06:52:05

问题


I have a stream deriving from a topic that contains 271 total messages the stream also contains 271 total messages, but when i create a other stream from that previous stream to flatten it, i get total messages of 542=(271*2).

this is the stream deriving from the topic

 Name                 : TRANSACTIONSPURE
 Type                 : STREAM
 Key field            : 
 Key format           : STRING
 Timestamp field      : Not set - using <ROWTIME>
 Value format         : JSON
 Kafka topic          : mongo_conn.digi.transactions (partitions: 1, 
 replication: 1)

 Field   | Type                                                                                                                                                                                                                                                                                                                                                   

 ROWTIME | BIGINT           (system)                                                                                                                                                                                                                                                                                                                              
 ROWKEY  | VARCHAR(STRING)  (system)                                                                                                                                                                                                                                                                                                                              
 PAYLOAD | STRUCT<SENDER VARCHAR(STRING), RECEIVER VARCHAR(STRING),  
 RECEIVERWALLETID VARCHAR(STRING), STATUS VARCHAR(STRING), TYPE 
 VARCHAR(STRING), AMOUNT DOUBLE, TOTALFEE DOUBLE, CREATEDAT 
 VARCHAR(STRING), UPDATEDAT VARCHAR(STRING), ID VARCHAR(STRING), 
 ORDERID 
 VARCHAR(STRING), __V VARCHAR(STRING), TXID VARCHAR(STRING), 
 SENDERWALLETID VARCHAR(STRING)> 
 Local runtime statistics
 ------------------------
 consumer-messages-per-sec:         0 consumer-total-bytes:    361356  
 consumer-total-messages:       271     last-message: 
 2019-09-02T10:44:14.003Z

and this is my flattened stream deriving from the previous stream

 Name                 : TRANSACTIONSRAW
 Type                 : STREAM
 Key field            : 
 Key format           : STRING
 Timestamp field      : Not set - using <ROWTIME>
 Value format         : JSON
 Kafka topic          : TRANSACTIONSRAW (partitions: 4, replication: 1)

  Field            | Type                      
 ----------------------------------------------
  ROWTIME          | BIGINT           (system) 
  ROWKEY           | VARCHAR(STRING)  (system) 
  SENDER           | VARCHAR(STRING)           
  RECEIVER         | VARCHAR(STRING)           
  RECEIVERWALLETID | VARCHAR(STRING)           
  STATUS           | VARCHAR(STRING)           
  TYPE             | VARCHAR(STRING)           
  AMOUNT           | DOUBLE                    
  TOTALFEE         | DOUBLE                    
  CREATEDAT        | VARCHAR(STRING)           
  UPDATEDAT        | VARCHAR(STRING)           
  ID               | VARCHAR(STRING)           
  ORDERID          | VARCHAR(STRING)           
  __V              | VARCHAR(STRING)           
  TXID             | VARCHAR(STRING)           
  SENDERWALLETID   | VARCHAR(STRING)           
 ----------------------------------------------

 Queries that write into this STREAM
 -----------------------------------
CSAS_TRANSACTIONSRAW_10 : CREATE STREAM transactionsraw 
with(value_format='JSON') as SELECT payload->sender as sender, 
payload->receiver as receiver, payload->receiverWalletId as 
receiverWalletId, payload->status as status, payload->type as type, 
payload->amount as amount, payload->totalFee as totalFee, 
payload->createdAt as createdAt, payload->updatedAt as updatedAt, 
payload->id as id,  payload->orderId as orderId , payload-> __v as __v, 
payload->txId as txId, payload->senderWalletId as senderWalletId from 
transactionspure;

For query topology and execution plan please run: EXPLAIN <QueryId>

Local runtime statistics
------------------------
consumer-messages-per-sec:         0 consumer-total-bytes:    315500 
consumer-total-messages:       542 messages-per-sec:         0   total-
messages:       271     last-message: 2019-09-02T10:44:15.493Z

来源:https://stackoverflow.com/questions/57770983/data-is-duplicated-when-i-create-a-flattened-stream

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!