I have been reading up on Java 8 Streams and the way data is streamed from a data source, rather than have the entire collection to extract data from.
This quote in par
Think of the stream as a nozzle connected to the water tank that is your data structure. The nozzle doesn't have its own storage. Sure, the water (data) the stream provides is coming from a source that has storage, but the stream itself has no storage. Connecting another nozzle (stream) to your tank (data structure) won't require storage for a whole new copy of the data.
A stream is just a view of the data, it has no storage of its own and you can't modify the underlying collection (assuming it's a stream that was built on top a collection) through the stream. It's like a "read only" access.
If you have any RDBMS experience - it's the exact same idea of "view".
The statement about streams and storage means that a stream doesn't have any storage of its own. If the stream's source is a collection, then obviously that collection has storage to hold the elements.
Let's take one of examples from that article:
int sum = shapes.stream()
.filter(s -> s.getColor() == BLUE)
.mapToInt(s -> s.getWeight())
.sum();
Assume that shapes
is a Collection
that has millions of elements. One might imagine that the filter
operation would iterate over the elements from the source and create a temporary collection of results, which might also have millions of elements. The mapToInt
operation might then iterate over that temporary collection and generate its results to be summed.
That's not how it works. There is no temporary, intermediate collection. The stream operations are pipelined, so elements emerging from filter
are passed through mapToInt
and thence to sum
without being stored into and read from a collection.
If the stream source weren't a collection -- say, elements were being read from a network collection -- there needn't be any storage at all. A pipeline like the following:
int sum = streamShapesFromNetwork()
.filter(s -> s.getColor() == BLUE)
.mapToInt(s -> s.getWeight())
.sum();
might process millions of elements, but it wouldn't need to store millions of elements anywhere.
Collection is a data structure. Based on the problem you decide which collection to be used like ArrayList, LinekedList (Considering time and space complexity) . Where as Stream is just a processing kind of tool, which makes your life easy.
Other difference is, you can consider Collection as in-memory data structure, where you can add , remove element. Where as in Stream you can perform two kind of operation:
a. Intermediate operation : Filter, map ,sort,limit on the result set
b. Terminal operation : forEach ,collect the result set to a collection.
But if you notice, with stream you can't add or remove elements.
Stream is kind of iterator, you can traverse collection through stream. Note, you can traverse stream only once, let me give you an example to have better understanding:
Example1:
List<String> employeeNameList = Arrays.asList("John","Peter","Sachin");
Stream<String> s = employeeNameList.stream();
// iterate through list
s.forEach(System.out :: println); // this work's perfectly fine
s.forEach(System.out :: println); // you will get IllegalStateException, stating stream already operated upon
So, what you can infer is, collection you can iterate as many times as you want. But for the stream, once you iterate , it won't remember what it is supposed to do. So, you need to instruct it again.
I hope, it is clear.
Previous answer are mostly correct. Yet still a much more intuitive response follows (for Google passengers landing here):
Think of streams as UNIX pipelines of text: cat input.file | sed ... | grep ... > output.file
In general those UNIX text utilities will consume an small quantity of RAM compared to the processed input data.
That's not always the case. Think of "sort". This algorithm will need to keep intermediate stuff in memory. That same is true for streams. Sometimes temporal data will be needed. Most of the times it will not.
As an extra simile, to some extend "cloud-serverless APIs" follows this same UNIX pipelines o Java stream design. They do not exist in memory until the have some input data to process. The cloud OS will launch them and inject the input data. The output is sent gradually somewhere else, so the cloud-serverless-API does not consume many resources (most of the times).
Not absolute "trues" in this case.