Spring Cloud Kafka: Can't serialize data for output stream when two processors are active

问题

I have a working setup for Spring Cloud Kafka Streams with functional programming style. There are two use cases, which are configured via application.properties. Both of them work individually, but as soon as I activate both at the same time, I get a serialization error for the output stream of the second use case:

Exception in thread "ActivitiesAppId-05296224-5ea1-412a-aee4-1165870b5c75-StreamThread-1" org.apache.kafka.streams.errors.StreamsException:
Error encountered sending record to topic outputActivities for task 0_0 due to:
...
Caused by: org.apache.kafka.common.errors.SerializationException:
Can't serialize data [com.example.connector.model.Activity@497b37ff] for topic [outputActivities]
Caused by: com.fasterxml.jackson.databind.exc.InvalidDefinitionException:
Incompatible types: declared root type ([simple type, class com.example.connector.model.Material]) vs com.example.connector.model.Activity

The last line here is important, as the "declared root type" is from the Material class, but not the Activity class, which is probably the source error.

Again, when I only activate the second use case before starting the application, everything works fine. So I assume that the "Material" processor somehow interfers with the "Activities" processor (or its serializer), but I don't know when and where.

Setup

1.) use case: "Materials"

one input stream -> transformation -> one output stream

@Bean
public Function<KStream<String, MaterialRaw>, KStream<String, Material>> processMaterials() {...}

application.properties

spring.cloud.stream.kafka.streams.binder.functions.processMaterials.applicationId=MaterialsAppId
spring.cloud.stream.bindings.processMaterials-in-0.destination=inputMaterialsRaw
spring.cloud.stream.bindings.processMaterials-out-0.destination=outputMaterials

2.) use case: "Activities"

two input streams -> joining -> one output stream

@Bean
public BiFunction<KStream<String, ActivityRaw>, KStream<String, Assignee>, KStream<String, Activity>> processActivities() {...}

application.properties

spring.cloud.stream.kafka.streams.binder.functions.processActivities.applicationId=ActivitiesAppId
spring.cloud.stream.bindings.processActivities-in-0.destination=inputActivitiesRaw
spring.cloud.stream.bindings.processActivities-in-1.destination=inputAssignees
spring.cloud.stream.bindings.processActivities-out-0.destination=outputActivities

The two processors are also defined as stream function in application.properties: spring.cloud.stream.function.definition=processActivities;processMaterials

Thanks!

Update - Here's how I use the processors in the code:

Implementation

// Material model
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
public class MaterialRaw {
    private String id;
    private String name;
}

@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
public class Material {
    private String id;
    private String name;
}

// Material processor
@Bean
public Function<KStream<String, MaterialRaw>, KStream<String, Material>> processMaterials() {
    return materialsRawStream -> materialsRawStream .map((recordKey, materialRaw) -> {
        // some transformation
        final var newId = materialRaw.getId() + "---foo";
        final var newName = materialRaw.getName() + "---bar";
        final var material = new Material(newId, newName);

        // output
        return new KeyValue<>(recordKey, material); 
    };
}

// Activity model
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
public class ActivityRaw {
    private String id;
    private String name;
}

@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
public class Assignee {
    private String id;
    private String assignedAt;
}

/**
 * Combination of `ActivityRaw` and `Assignee`
 */
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
public class Activity {
    private String id;
    private Integer number;
    private String assignedAt;
}

// Activity processor
@Bean
public BiFunction<KStream<String, ActivityRaw>, KStream<String, Assignee>, KStream<String, Activity>> processActivities() {
    return (activitiesRawStream, assigneesStream) -> { 
        final var joinWindow = JoinWindows.of(Duration.ofDays(30));

        final var streamJoined = StreamJoined.with(
            Serdes.String(),
            new JsonSerde<>(ActivityRaw.class),
            new JsonSerde<>(Assignee.class)
        );

        final var joinedStream = activitiesRawStream.leftJoin(
            assigneesStream,
            new ActivityJoiner(),
            joinWindow,
            streamJoined
        );

        final var mappedStream = joinedStream.map((recordKey, activity) -> {
            return new KeyValue<>(recordKey, activity);
        });

        return mappedStream;
    };
}

回答1:

This turns out to be an issue with the way the binder infers Serde types when there are multiple functions with different outbound target types, one with Activity and another with Material in your case. We will have to address this in the binder. I created an issue here.

In the meantime, you can follow this workaround.

Create a custom Serde class as below.

public class ActivitySerde extends JsonSerde<Activity> {}

Then, explicitly use this Serde for the outbound of your processActivities function using configuration.

For e.g.,

spring.cloud.stream.kafka.streams.bindings.processActivities-out-0.producer.valueSerde=com.example.so65003575.ActivitySerde

Please change the package to the appropriate one if you are trying this workaround.

Here is another recommended approach. If you define a bean of type Serde with the target type, that takes precedence as the binder will do a match against the KStream type. Therefore, you can also do it without defining that extra class in the above workaround.

@Bean
public Serde<Activity> activitySerde() {
  return new JsonSerde(Activity.class);
}

Here are the docs where it explains all these details.

回答2:

You need to specify which binder to use for each function s.c.s.bindings.xxx.binder=....

However, without that, I would have expected an error such as "multiple binders found but no default specified", which is what happens with message channel binders.

来源：https://stackoverflow.com/questions/65003575/spring-cloud-kafka-cant-serialize-data-for-output-stream-when-two-processors-a

标签

java

Spring

apache-kafka

spring-kafka

spring-cloud-stream