I need to get the final URL after a page redirect preferably with curl or wget.
For example http://google.com may redirect to http://www.google.com
You can do this with wget usually. wget --content-disposition
"url" additionally if you add -O /dev/null
you will not be actually saving the file.
wget -O /dev/null --content-disposition example.com
Thank you. I ended up implementing your suggestions: curl -i + grep
curl -i http://google.com -L | egrep -A 10 '301 Moved Permanently|302 Found' | grep 'Location' | awk -F': ' '{print $2}' | tail -1
Returns blank if the website doesn't redirect, but that's good enough for me as it works on consecutive redirections.
Could be buggy, but at a glance it works ok.
This would work:
curl -I somesite.com | perl -n -e '/^Location: (.*)$/ && print "$1\n"'
curl
's -w option and the sub variable url_effective
is what you are
looking for.
Something like
curl -Ls -o /dev/null -w %{url_effective} http://google.com
More info
-L Follow redirects -s Silent mode. Don't output anything -o FILE Write output to <file> instead of stdout -w FORMAT What to output after completion
More
You might want to add -I (that is an uppercase i
) as well, which will make the command not download any "body", but it then also uses the HEAD method, which is not what the question included and risk changing what the server does. Sometimes servers don't respond well to HEAD even when they respond fine to GET.
curl
can only follow http redirects. To also follow meta refresh directives and javascript redirects, you need a full-blown browser like headless chrome:
#!/bin/bash
real_url () {
printf 'location.href\nquit\n' | \
chromium-browser --headless --disable-gpu --disable-software-rasterizer \
--disable-dev-shm-usage --no-sandbox --repl "$@" 2> /dev/null \
| tr -d '>>> ' | jq -r '.result.value'
}
If you don't have chrome installed, you can use it from a docker container:
#!/bin/bash
real_url () {
printf 'location.href\nquit\n' | \
docker run -i --rm --user "$(id -u "$USER")" --volume "$(pwd)":/usr/src/app \
zenika/alpine-chrome --no-sandbox --repl "$@" 2> /dev/null \
| tr -d '>>> ' | jq -r '.result.value'
}
Like so:
$ real_url http://dx.doi.org/10.1016/j.pgeola.2020.06.005
https://www.sciencedirect.com/science/article/abs/pii/S0016787820300638?via%3Dihub