What does “streaming” mean in Apache Spark and Apache Flink?

前端 未结 1 1938
北海茫月
北海茫月 2021-02-06 01:22

As I went to Apache Spark Streaming Website, I saw a sentence:

Spark Streaming makes it easy to build scalable fault-tolerant streaming applications.

相关标签:
1条回答
  • 2021-02-06 01:42

    Streaming data analysis (in contrast to "batch" data analysis) refers to a continuous analysis of a typically infinite stream of data items (often called events).

    Characteristics of Streaming Applications

    Stream data processing applications are typically characterized by the following points:

    • Streaming applications run continuously, for a very long time, and consume and process events as soon as they appear. In contrast. batch applications gather data in files or databases and process it later.

    • Streaming applications frequently concern themselves with the latency of results. The latency is the delay between the creation of an event and the point when the analysis application has taken that event into account.

    • Because streams are infinite, many computations cannot refer not to the entire stream, but to a "window" over the stream. A window is a view of a sub-sequence of the stream events (such as the last 5 minutes). An example of a real world window statistic is the "average stock price over the past 3 days".

    • In streaming applications, the time of an event often plays a special role. Interpreting events with respect to their order in time is very common. While certain batch applications may do that as well, it not a core concept there.

    Examples of Streaming Applications

    Typical examples of stream data processing application are

    • Fraud Detection: The application tries to figure out whether a transaction fits with the behavior that has been observed before. If it does not, the transaction may indicate an attempted misuse. Typically very latency critical application.

    • Anomaly Detection: The streaming application builds a statistical model of the events it observes. Outliers indicate anomalies and may trigger alerts. Sensor data may be one source of events that one wants to analyze for anomalies.

    • Online Recommenders: If not a lot of past behavior information is available on a user that visits a web shop, it is interesting to learn from her behavior as she navigates the pages and explores articles, and to start generating some initial recommendations directly.

    • Up-to-date Data Warehousing: There are interesting articles on how to model a data warehousing infrastructure as a streaming application, where the event stream is sequence of changes to the database, and the streaming application computes various warehouses as specialized "aggregate views" of the event stream.

    • There are many more ...

    0 讨论(0)
提交回复
热议问题