It is possible to read unnested JSON files on Cloud Storage with Dataflow via:
p.apply(\"read logfiles\", TextIO.Read.from(\"gs://bucket/*\").withCoder(Tabl
Your best bet is probably to do what you described in #2 and use Jackson directly. It makes the most sense to let the TextIO read do what it is built for -- reading lines from a file with the string coder -- and then use a DoFn
to actually parse the elements. Something like the following:
PCollection<String> lines = pipeline
.apply(TextIO.from("gs://bucket/..."));
PCollection<TableRow> objects = lines
.apply(ParDo.of(new DoFn<String, TableRow>() {
@Override
public void processElement(ProcessContext c) {
String json = c.element();
SomeObject object = /* parse json using Jackson, etc. */;
TableRow row = /* create a table row from object */;
c.output(row);
}
});
Note that you could also do this using multiple ParDos.