问题
I'm using a Java program to get expanded URLs from short URLs. Given a Java URLConnection
, among the two approaches, which one is better to get the desired result?
Connection.getHeaderField("Location");
vs
Connection.getURL();
I guess both of them give the same output. The first approach did not give me the best results, only 1 out of 7 were resolved. Can the efficiency be increased by the second approach?
Can we use any other better approach?
回答1:
I'd use the following:
@Test
public void testLocation() throws Exception {
final String link = "http://bit.ly/4Agih5";
final URL url = new URL(link);
final HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setInstanceFollowRedirects(false);
final String location = urlConnection.getHeaderField("location");
assertEquals("http://stackoverflow.com/", location);
assertEquals(link, urlConnection.getURL().toString());
}
With setInstanceFollowRedirects(false)
the HttpURLConnection
does not follow redirects and the destination page (stackoverflow.com
in the above example) will not be downloaded just the redirect page from bit.ly
.
One drawback is that when a resolved bit.ly
URL points to another short URL for example on tinyurl.com
you will get a tinyurl.com
link, not what the tinyurl.com
redirects to.
Edit:
To see the reponse of bit.ly
use curl
:
$ curl --dump-header /tmp/headers http://bit.ly/4Agih5
<html>
<head>
<title>bit.ly</title>
</head>
<body>
<a href="http://stackoverflow.com/">moved here</a>
</body>
</html>
As you can see bit.ly
sends only a short redirect page. Then check the HTTP headers:
$ cat /tmp/headers
HTTP/1.0 301 Moved Permanently
Server: nginx
Date: Wed, 06 Nov 2013 08:48:59 GMT
Content-Type: text/html; charset=utf-8
Cache-Control: private; max-age=90
Location: http://stackoverflow.com/
Mime-Version: 1.0
Content-Length: 117
X-Cache: MISS from cam
X-Cache-Lookup: MISS from cam:3128
Via: 1.1 cam:3128 (squid/2.7.STABLE7)
Connection: close
It sends a 301 Moved Permanently
response with a Location
header (which points to http://stackoverflow.com/
). Modern browsers don't show you the HTML page above. Instead they automatically redirect you to the URL in the Location
header.
回答2:
The above link contains a more complete method along the same line as the previous post https://github.com/cpdomina/WebUtils/blob/master/src/net/cpdomina/webutils/URLUnshortener.java
来源:https://stackoverflow.com/questions/7793827/how-to-get-the-complete-url-address-most-efficiently