I have a pipeline that successfully outputs an Avro file as follows:
@DefaultCoder(AvroCoder.class)
class MyOutput_T_S {
T foo;
S bar;
Boolean baz;
publi
I think there are two questions (correct me if I am wrong):
MyOutput<T, S>
?MyOutput<T, S>
to a file using AvroIO.Write
.The first question is to be solved by registering a CoderFactory
as in the linked question you found.
Your naive coder is probably allowing you to run the pipeline without issues because serialization is being optimized away. Certainly an Avro schema with no fields will result in those fields being dropped in a serialization+deserialization round trip.
But assuming you fill in the schema with the fields, your approach to CoderFactory#create
looks right. I don't know the exact cause of the message java.lang.IllegalArgumentException: Unable to get field id from class null
, but the call to AvroCoder.of(MyOutput.class, schema)
should work, for an appropriately assembled schema
. If there is an issue with this, more details (such as the rest of the stack track) would be helpful.
However, your override of CoderFactory#getInstanceComponents
should return a list of values, one per type parameter of MyOutput
. Like so:
@Override
public List<Object> getInstanceComponents(Object value) {
MyOutput<Object, Object> myOutput = (MyOutput<Object, Object>) value;
return ImmutableList.of(myOutput.foo, myOutput.bar);
}
The second question can be answered using some of the same support code as the first, but otherwise is independent. AvroIO.Write.withSchema
always explicitly uses the provided schema. It does use AvroCoder
under the hood, but this is actually an implementation detail. Providing a compatible schema is all that is necessary - such a schema will have to be composed for each value of T
and S
for which you want to output MyOutput<T, S>
.