Apache Kafka: Distributed messaging system
Apache Storm: Real Time Message Processing
How we can use both technologies in a real-time data pipeline for processing
When I have a use case that requires me to visualize or alert on patterns (think of twitter trends), while continuing to process the events, I have a several patterns.
NiFi would allow me to process an event and update a persistent data store with low(er) batch aggregation with very, very little custom coding.
Storm (lots of custom coding) allows me nearly real time access to the trending events.
If I can wait for many seconds, then I can batch out of kafka, into hdfs (Parquet) and process.
If I need to know in seconds, I need NiFi, and probably even Storm. (Think of monitoring thousands of earth stations, where I need to see small region weather conditions for tornado warnings).