I have input lines like below
t1, file1, 1, 1, 1
t1, file1, 1, 2, 3
t1, file2, 2, 2, 2, 2
t2, file1, 5, 5, 5
t2, file2, 1, 1, 2, 2
<
What you're looking for is updateStateByKey
. For DStream[(T, U)]
it should take a function with two arguments:
Seq[U]
- representing state for current windowOption[U]
- representing accumulated stateand return Option[U]
.
Given your code it could be implemented for example like this:
import breeze.linalg.{DenseVector => BDV}
import scala.util.Try
val state: DStream[(String, Array[Int])] = parsedStream.updateStateByKey(
(current: Seq[Array[Int]], prev: Option[Array[Int]]) => {
prev.map(_ +: current).orElse(Some(current))
.flatMap(as => Try(as.map(BDV(_)).reduce(_ + _).toArray).toOption)
})
To be able to use it you'll have to configure checkpointing.