How to use Reactive Streams for NIO binary processing?

后端 未结 2 869
小蘑菇
小蘑菇 2021-02-06 02:35

Are there some code examples of using org.reactivestreams libraries to process large data streams using Java NIO (for high performance)? I\'m aiming at distributed processing, s

2条回答
  •  野趣味
    野趣味 (楼主)
    2021-02-06 02:57

    We actually use akka streams to process binary files. It was a little tricky to get things going as there wasn't any documentation around this, but this is what we came up with:

    val binFile = new File(filePath)
    val inputStream = new BufferedInputStream(new FileInputStream(binFile))
    val binStream = Stream.continually(inputStream.read).takeWhile(-1 != _).map(_.toByte) 
    val binSource = Source(binStream)
    

    Once you have binSource, which is an akka Source[Byte] you can go ahead and start applying whatever stream transformations (map, flatMap, transform, etc...) you want to it. This functionality leverages the Source companion object's apply that takes an Iterable, passing in a scala Stream that should read in the data lazily and make it available to your transforms.

    EDIT

    As Konrad pointed out in the comments section, a Stream can be an issue with large files due to the fact that it performs memoization of the elements it encounters as it's lazily building out the stream. This can lead to out of memory situations if you are not careful. However, if you look at the docs for Stream there is a tip for avoiding memoization building up in memory:

    One must be cautious of memoization; you can very quickly eat up large amounts of memory if you're not careful. The reason for this is that the memoization of the Stream creates a structure much like scala.collection.immutable.List. So long as something is holding on to the head, the head holds on to the tail, and so it continues recursively. If, on the other hand, there is nothing holding on to the head (e.g. we used def to define the Stream) then once it is no longer being used directly, it disappears.

    So taking that into account, you could modify my original example as follows:

    val binFile = new File(filePath)
    val inputStream = new BufferedInputStream(new FileInputStream(binFile))     
    val binSource = Source(() => binStream(inputStream).iterator)
    
    def binStream(in:BufferedInputStream) = Stream.continually(in.read).takeWhile(-1 != _).map(_.toByte)
    

    So the idea here is to build the Stream via a def and not assign to a valand then immediately get the iterator from it and use that to initialize the Akka Source. Setting things up this way should avoid the issues with momoization. I ran the old code against a big file and was able to produce an OutOfMemory situation by doing a foreach on the Source. When I switched it over to the new code I was able to avoid this issue.

提交回复
热议问题