Get final URL after curl is redirected

前端未结

关注

 11  1818

清酒与你

I need to get the final URL after a page redirect preferably with curl or wget.

For example http://google.com may redirect to http://www.google.com

相关标签:

11条回答

清酒与你

2020-12-04 08:00

You can do this with wget usually. wget --content-disposition "url" additionally if you add -O /dev/null you will not be actually saving the file.

wget -O /dev/null --content-disposition example.com

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤街浪徒

2020-12-04 08:02
Thank you. I ended up implementing your suggestions: curl -i + grep
```
curl -i http://google.com -L | egrep -A 10 '301 Moved Permanently|302 Found' | grep 'Location' | awk -F': ' '{print $2}' | tail -1
```
Returns blank if the website doesn't redirect, but that's good enough for me as it works on consecutive redirections.

Could be buggy, but at a glance it works ok.
0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2020-12-04 08:02
This would work:
```
 curl -I somesite.com | perl -n -e '/^Location: (.*)$/ && print "$1\n"'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2020-12-04 08:03
curl's -w option and the sub variable url_effective is what you are looking for.

Something like
```
curl -Ls -o /dev/null -w %{url_effective} http://google.com
```
More info
```
-L         Follow redirects
-s         Silent mode. Don't output anything
-o FILE    Write output to <file> instead of stdout
-w FORMAT  What to output after completion
```
More

You might want to add -I (that is an uppercase i) as well, which will make the command not download any "body", but it then also uses the HEAD method, which is not what the question included and risk changing what the server does. Sometimes servers don't respond well to HEAD even when they respond fine to GET.
0 讨论(0)
发布评论:

提交评论
- 加载中...

無奈伤痛

2020-12-04 08:06

curl can only follow http redirects. To also follow meta refresh directives and javascript redirects, you need a full-blown browser like headless chrome:

#!/bin/bash
real_url () {
    printf 'location.href\nquit\n' | \
    chromium-browser --headless --disable-gpu --disable-software-rasterizer \
    --disable-dev-shm-usage --no-sandbox --repl "$@" 2> /dev/null \
    | tr -d '>>> ' | jq -r '.result.value'
}

If you don't have chrome installed, you can use it from a docker container:

#!/bin/bash
real_url () {
    printf 'location.href\nquit\n' | \
    docker run -i --rm --user "$(id -u "$USER")" --volume "$(pwd)":/usr/src/app \
    zenika/alpine-chrome --no-sandbox --repl "$@" 2> /dev/null \
    | tr -d '>>> ' | jq -r '.result.value'
}

Like so:

$ real_url http://dx.doi.org/10.1016/j.pgeola.2020.06.005 
https://www.sciencedirect.com/science/article/abs/pii/S0016787820300638?via%3Dihub

0 讨论(0)

上一页 1 2