Apache Beam stream processing of json data

前端 未结 1 1649
盖世英雄少女心
盖世英雄少女心 2021-01-22 05:16

I am analyzing Apache Beam stream processing of data. I have worked on Apache Kafka stream processing (Producer, Consumer etc). I want to compare it with Beam now.

I wan

1条回答
  •  深忆病人
    2021-01-22 05:27

    There are multiple aspects of this:

    • first you need to establish where the data is coming from:
      • you need to use some kind of IO in Beam pipeline, see here;
      • there are a bunch of built in IOs, see the list here;
      • by using an IO from the above link you will likely get a stream of strings containing those JSON objects;
      • some IOs can natively parse Avro and other formats (PubsubIO), this depends on specific IO implementation;
    • then you may need to transform the data:

      • you will need to create your own PTransform which handles the conversion from a JSON string to your Java class:
        • see the section about PTransforms here;
      • you can see an example of such transform here:
        • this JsonToRow PTransform accepts a string with JSON object and converts it to a Beam Row using Jackson ObjectMapper;
        • you can either try using the Row object yourself, or you can implement a similar transform to convert JSON strings to your custom Java type instead of Row;
    • you may also take a look at examples folder in Beam source;

    0 讨论(0)
提交回复
热议问题