问题
How 'capacity' is calculated?
From their documentation
The "capacity" metric is very useful and tells you what % of the time in the last 10 minutes the bolt spent executing tuples. If this value is close to 1, then the bolt is "at capacity" and is a bottleneck in your topology. The solution to at-capacity bolts is to increase the parallelism of that bolt.
I don't quite understand % of time. So if the value is 0.75 - what does it really mean?
回答1:
It's the percent of time that the bolt is busy vs idle. 0.75 would mean that 25% of the time is waiting for new data to be processed.
Lets say you receive a new input tuple every second but your bolt takes 0.1 seconds to process it, the bolt will be idle 90% of the time and the capacity will be 0.1.
Another example: Imagine you receive more data in real time that you can process and you have two bolts and the task that is doing the first bolt takes more time than the second so the first bolt is your bottleneck. The capacity of the first bolt will be around 1 and the capacity of the second will be below 1.
In both examples above, then you can determine the parallelism (or processing power) that you need to set up for each bolt by looking at this number.
If the first bolt capacity is 1 and the second is 0.5 you probably want to set up twice the number of executors to the first bolt than two the second. At the same time (and most importantly), you have to increase the number of executors until that bolt capacity is below 1, so you are sure that your topology is able to keep up and process all the data that is coming in real time.
来源:https://stackoverflow.com/questions/42991459/apache-storm-ui-capacity-metric