Setting Custom Coders & Handling Parameterized types

前端 未结 1 1548
隐瞒了意图╮
隐瞒了意图╮ 2021-01-19 00:25

I have two questions related to coder issues I am facing with my Dataflow pipeline.

  • How do I go about setting a coder for my custom data types? The class consi
相关标签:
1条回答
  • 2021-01-19 00:53

    It looks like you have been bitten by two issues. Thanks for bringing them to our attention! Fortunately, there are easy workarounds for both while we improve things.

    The first issue is that the default coder registry does not have an entry for mapping Set.class to SetCoder. We have filed GitHub issue #56 to track its resolution. In the meantime, you can use the following code to perform the needed registration:

    pipeline.getCoderRegistry().registerCoder(Set.class, SetCoder.class);
    

    The second issue is that parameterized types currently require advanced treatment in the coder registry, so the @DefaultCoder will not be honored. We have filed Github issue #57 to track this. The best way to ensure that SerializableCoder is used everywhere for CustomType is to register a CoderFactory for your type that will return a SerializableCoder. Supposing your type is something like this:

    public class CustomType<T extends Serializable> implements Serializable {
      T field;
    }
    

    Then the following code registers a CoderFactory that produces appropriate SerializableCoder instances:

    pipeline.getCoderRegistry().registerCoder(CustomType.class, new CoderFactory() {
      @Override
      public Coder<?> create(List<? extends Coder<?>>) {
        // No matter what the T is, return SerializableCoder
        return SerializableCoder.of(CustomType.class);
      }
    
      @Override
      public List<Object> getInstanceComponents(Object value) {
        // Return the T inside your CustomType<T> to enable coder inference for Create
        return Collections.singletonList(((CustomType<Object>) value).field);
      }
    });
    

    Now, whenever you use CustomType in your pipeline, the coder registry will produce a SerializableCoder.

    Note that SerializableCoder is not deterministic (the bytes of encoded objects are not necessarily equal for objects that are equals()) so values encoded using this coder cannot be used as keys in a GroupByKey operation.

    0 讨论(0)
提交回复
热议问题