Apache HTTPClient throws java.net.SocketException: Connection reset for many domains

前端 未结 3 671
北恋
北恋 2020-12-10 06:07

I\'m creating a (well behaved) web spider and I notice that some servers are causing Apache HttpClient to give me a SocketException -- specifically:

java.net         


        
相关标签:
3条回答
  • 2020-12-10 06:14

    First, to answer your question:

    The connection reset was caused by a problem on the server side. Most likely the server failed to parse the request or was unable to process it and dropped the connection as a result without returning a valid response. There is likely something in the HTTP requests generated by HttpClient that causes server side logic to fail, probably due to a server side bug. Just because the error message does not say 'by peer' does not mean the connection reset took place on the client side.

    A few remarks:

    (1) Several popular web crawlers such as bixo http://openbixo.org/ use HttpClient without major issues but pretty much of them had to tweak HttpClient behavior to make it more lenient about common HTTP protocol violations. Per default HttpClient is rather strict about the HTTP protocol compliance.

    (2) Why did not you report the NPE problem or any other problem you have been experiencing to the HttpClient project?

    0 讨论(0)
  • 2020-12-10 06:24

    Try getting a network trace using wireshark, and augment that with log4j logging of the HTTPClient. That should show why the connection is being reset

    0 讨论(0)
  • 2020-12-10 06:38

    These two settings will sometimes help:

     client.getParams().setParameter("http.socket.timeout", new Integer(0));
     client.getParams().setParameter("http.connection.stalecheck", new  Boolean(true));
    

    The first sets the socket timeout to be infinite.

    0 讨论(0)
提交回复
热议问题