How to combine streaming data with large history data set in Dataflow/Beam

前端 未结 1 1954
有刺的猬
有刺的猬 2020-12-31 08:42

I am investigating processing logs from web user sessions via Google Dataflow/Apache Beam and need to combine the user\'s logs as they come in (streaming) with the history o

相关标签:
1条回答
  • 2020-12-31 09:24

    There is not currently a way of accessing per-key side inputs in streaming, but it would definitely be useful exactly as you describe, and it is something we are considering implementing.

    One possible workaround is to use the side inputs to distribute pointers to the actual session history. The code generating the 24h session histories could upload them to GCS/BigQuery/etc, then send the locations as a side input to the joining code.

    0 讨论(0)
提交回复
热议问题