Apache Storm compared to Hadoop

后端未结

关注

 6  1366

How does Storm compare to Hadoop? Hadoop seems to be the defacto standard for open-source large scale batch processing, does Storm has any advantages over hadoop? or Are they co

相关标签:

6条回答

生来不讨喜

2021-01-30 02:15

Basically, both of them are used for analyzing big data, but Storm is used for real time processing while Hadoop is used for batch processing.

This is a very good introduction to Storm that I found: Click here

0 讨论(0)
发布评论:

提交评论
- 加载中...
萌比男神i

2021-01-30 02:18

Rather than to be compared, they are supposed to supplement each other now having batch + real-time (pseudo-real time) processing. There is a corresponding video presentation - Ted Dunning on Twitter's Storm

0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2021-01-30 02:19
Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.

Since many sub systems exists in Hadoop ecosystem, we have to chose right sub system depending on business requirements & feasibility of a particular system.

Hadoop MapReduce is efficient for batch processing of one job at a time. This is the reason why Hadoop is being used extensively as a data warehousing tool rather than data analysis tool.

Since the question is related to only "Storm" vs "Hadoop", have a look at Storm use cases - Financial Services, Telecom, Retail, Manufacturing, Transportation.
1. Hadoop MapReduce is best suited for batch processing.
2. Storm is a complete stream processing engine and can be used for real time data analytics with latency in sub-seconds.
Have a look at this dezyre article for comparison between Hadoop, Storm and Spark. It explains similarities and differences.

It can be summarized with below picture ( from dezyre article)
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2021-01-30 02:25

I've been using Storm for a while and now I've quit this really good technology for an amazing one : Spark (http://spark.apache.org) which provides developer with a unified API for batch or streaming processing (micro-batch) as well as machine learning and graph processing.

worth a try.

0 讨论(0)
发布评论:

提交评论
- 加载中...
粉色の甜心

2021-01-30 02:26
Why don't you tell your opinion.
- http://www.infoq.com/news/2011/09/twitter-storm-real-time-hadoop/
- http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html
Twitter Storm has been touted as real time Hadoop. That is more a marketing take for easy consumption.

They are superficially similar since both are distributed application solutions. Apart from the typical distributed architectural elements like master/slave, zookeeper based coordination, to me comparison falls off the cliff.

Twitter is more like a pipline for processing data as it comes. The pipe is what connects various computing nodes that receive data, compute and deliver output. (There lingo is spouts and bolts) Extend this analogy to a complex pipeline wiring that can be re-engineered when required and you get Twitter Storm.

In nut shell it processes data as it comes. There is no latency.

Hadoop how ever is different in this respect primarily due to HDFS. It a solution geared to distributed storage and tolerance to outage of many scales (disks, machines, racks etc)

M/R is built to leverage data localization on HDFS to distribute computational jobs. Together, they do not provide facility for real time data processing. But that is not always a requirement when you are looking through large data. (needle in the haystack analogy)

In short, Twitter Storm is a distributed real time data processing solution. I don't think we should compare them. Twitter built it because it needed a facility to process small tweets but humungous number of them and in real time.

See: HStreaming if you are compelled to compare it with some thing
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光说笑

2021-01-30 02:40

Storm is for Fast Data (real time) & Hadoop is for Big data(pre-existing tons of data). Storm can't process Big data but it can generate Big data as a output.

0 讨论(0)
发布评论:

提交评论
- 加载中...