问题
I read lot about global state store that it does not create change-topic topic for restore instead it use the source topic as restore.
i am create custom key and store the data in global state store, but after restart it will gone because global store on restore will directly take data from source topic and bypass the processor.
my input topic has above data.
{
"id": "user-12345",
"user_client": [
"clientid-1",
"clientid-2"
]
}
i am maintaining two state store as follow:
- id ->record (record means above json)
- clientid-1: ["user-12345"] (clientid -> user-id)
- clientid-2: ["user-12345"] (clientid -> user-id)
So i have seen workaround is to create a custom change-log topic and send data with key to that topic that will act as a source topic for the global state store.
but in my scenario i have to fill two record in state store what is the best way to do it.
Example Scenario:
Record1: {
"id": "user-1",
"user_client": [
"clientid-1",
"clientid-2"
]
}
Record2:{
"id": "user-2",
"user_client": [
"clientid-1",
"clientid-3"
]
}
Global-state store should have:
id -> json Record'
clientid-1: ["user-1", "user-2"]
clientid-2: ["user-2"]
clientid-3: ["user-2"]
how to maintain the restore case for the above scenario in global state store
回答1:
One approach is we maintain a changelog topic (has retention.policy=compact) for GlobalKTable, let call it user_client_global_ktable_changelog
, for the sake of simplicity, let say we serialize your message to java classes (you can just use HashMap or JsonNode or something):
//initial message format
public class UserClients {
String id;
Set<String> userClient;
}
//message when key is client
public class ClientUsers {
String clientId;
Set<String> userIds;
}
//your initial topic
KStream<String, UserClients> userClientKStream = streamsBuilder.stream("un_keyed_topic");
- It easy to re-key the record to user_id, just rekey the KStream then send it to the output topic
//re-map initial message to user_id:{inital_message_payload}
userClientKStream
.map((defaultNullKey, userClients) -> KeyValue.pair(userClients.getId(), userClients))
.to("user_client_global_ktable_changelog");//please provide appropriate serdes
- Aggregate user_id for a particular client, we can use a local state (KTable) for keeping the (current user_ids list of current client_id):
userClientKStream
//will cause data re-partition before running groupByKey (will create an internal -repartition topic)
.flatMap((defaultNullKey, userClients)
-> userClients.getUserClient().stream().map(clientId -> KeyValue.pair(clientId, userClients.getId())).collect(Collectors.toList()))
//we have to maintain a current aggregated store for user_ids for a particular client_id
.groupByKey()
.aggregate(ClientUsers::new, (clientId, userId, clientUsers) -> {
clientUsers.getUserIds().add(userId);
return clientUsers;
}, Materialized.as("client_with_aggregated_user_ids"))
.toStream()
.to("user_client_global_ktable_changelog");//please provide appropriate serdes
E.g for aggregating user_ids in local state:
//re-key message for client-based message
clientid-1:user-1
//your current aggregated for `clientid-1`
"clientid-1"
{
"user_id": ["user-1"]
}
//re-key message for client-based message
clientid-1:user-2
//your current aggregated for `clientid-1`
"clientid-1"
{
"user_id": ["user-1", "user-2"]
}
Actually we could use the changelog topic of the local state as changelog for GlobalKTable directly if you make some change, which is topic your_application-client_with_aggregated_user_ids-changelog
, by adjust the state to keep both the payload of user key and client key message.
来源:https://stackoverflow.com/questions/60613596/global-state-store-dont-create-change-log-topic-what-is-the-workaround-if-input