Are there some code examples of using org.reactivestreams libraries to process large data streams using Java NIO (for high performance)? I\'m aiming at distributed processing, s
Do not use scala.collection.immutable.Stream
to consume files like this, the reason being that it performs memoization - that is, while yes it is lazy it will keep the entire stream buffered (memoized) in memory!
This is definitely not what you want when you think about "stream processing a file". The reason Scala's Stream works like this is because in a functional setting it makes complete sense - you can avoid calculating fibbonachi numbers again and again easily thanks to this for example, for more details see the ScalaDoc.
Akka Streams provides Reactive Streams implementations and provides a FileIO
class that you could use here (it will properly back-pressure and pull the data out of the file only when needed and the rest of the stream is ready to consume it):
import java.io._
import akka.actor.ActorSystem
import akka.stream.scaladsl.{ Sink, Source }
object ExampleApp extends App {
implicit val sys = ActorSystem()
implicit val mat = FlowMaterializer()
FileIO.fromPath(Paths.get("/example/file.txt"))
.map(c ⇒ { print(c); c })
.runWith(Sink.onComplete(_ ⇒ { f.close(); sys.shutdown() } ))
}
Here are more docs about working with IO with Akka Streams Note that this is for the current-as-of writing version of Akka, so the 2.5.x series.
Hope this helps!