StormCrawler: Timeout waiting for connection from pool

三世轮回 提交于 2019-12-11 16:14:00

问题


We are consistently getting the following error when we increase either the number of threads or the number of executors for Fetcher bolt.

org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:286) ~[stormjar.jar:?]
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:263) ~[stormjar.jar:?]
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190) ~[stormjar.jar:?]
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[stormjar.jar:?]
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[stormjar.jar:?]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71) ~[stormjar.jar:?]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:220) ~[stormjar.jar:?]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:164) ~[stormjar.jar:?]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:139) ~[stormjar.jar:?]
at com.digitalpebble.stormcrawler.protocol.httpclient.HttpProtocol.getProtocolOutput(HttpProtocol.java:206) ~[stormjar.jar:?]

Is this due to a resource leak or some hard limit on the size of the http thread pool? If it is about the thread pool, is there any way to increase the pool size?


回答1:


There is a max number of connections for the pool set in HttpProtocol, which is the number of threads used (fetcher.threads.number). Since the pool is static, it is used by all the executors on the same worker. I'd recommend that you use one FetcherBolt instance per worker, it will then be the same value as fetcher.threads.number and you won't have this problem.

Alternatively, you could give the okhttp protocol a try. It is more robust for open and large-scale crawls. See WIKI page on protocols for a feature comparison.



来源:https://stackoverflow.com/questions/49149490/stormcrawler-timeout-waiting-for-connection-from-pool

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!