commons io 403 for URL but httpclient is ok

被刻印的时光 ゝ 提交于 2019-12-12 12:11:53

问题


commons io code :

String resultURL = String.format(GOOGLE_RECOGNIZER_URL, URLEncoder.encode("hello", "UTF-8"), "en-US");
URI uri = new URI(resultURL);
byte[] resultIO = IOUtils.toByteArray(uri);

I got this exception:

Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://translate.google.cn/translate_tts?ie=UTF-8&q=hello&tl=en-US&total=1&idx=0&textlen=3
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:654)
    at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:635)
    at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:617)
    at com.renren.intl.soundsns.simsimi.speech.ttsclient.impl.GoogleTTSClient.main(GoogleTTSClient.java:70)

but when I use httpclient, the result is ok.

String resultURL = String.format(GOOGLE_RECOGNIZER_URL, URLEncoder.encode(text, "UTF-8"), "en-US");

HttpClient client = new HttpClient();

GetMethod g = new GetMethod(resultURL);

client.executeMethod(g);

byte[] resultByte = g.getResponseBody();

How this happened?

thanks in advance :)

maven dependencies:

<dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.4</version>
</dependency>
<dependency>
        <groupId>commons-httpclient</groupId>
        <artifactId>commons-httpclient</artifactId>
        <version>3.1</version>
</dependency>

回答1:


Jon Skeet is right!

For me in case of java.net.URL JVM pass next headers:

User-Agent: Java/1.7.0_10
Host: translate.google.cn
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive

In case of Apache HttpClient:

User-Agent: Jakarta Commons-HttpClient/3.1
Host: translate.google.cn

And if you change, the user agent for java.net.URL:

System.setProperty("http.agent", "Jakarta Commons-HttpClient/3.1");

request is successful, without HTTP 403.

Looks like you get 403 error if your user-agent start with: Java. Any user agent with pattern Java.* throws 403 error. But if you use this pattern .+Java.* all is ok.



来源:https://stackoverflow.com/questions/14996140/commons-io-403-for-url-but-httpclient-is-ok

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!