Get final URL after curl is redirected

前端 未结 11 1807
清酒与你
清酒与你 2020-12-04 07:38

I need to get the final URL after a page redirect preferably with curl or wget.

For example http://google.com may redirect to http://www.google.com

相关标签:
11条回答
  • 2020-12-04 08:00

    You can do this with wget usually. wget --content-disposition "url" additionally if you add -O /dev/null you will not be actually saving the file.

    wget -O /dev/null --content-disposition example.com

    0 讨论(0)
  • 2020-12-04 08:02

    Thank you. I ended up implementing your suggestions: curl -i + grep

    curl -i http://google.com -L | egrep -A 10 '301 Moved Permanently|302 Found' | grep 'Location' | awk -F': ' '{print $2}' | tail -1
    

    Returns blank if the website doesn't redirect, but that's good enough for me as it works on consecutive redirections.

    Could be buggy, but at a glance it works ok.

    0 讨论(0)
  • 2020-12-04 08:02

    This would work:

     curl -I somesite.com | perl -n -e '/^Location: (.*)$/ && print "$1\n"'
    
    0 讨论(0)
  • 2020-12-04 08:03

    curl's -w option and the sub variable url_effective is what you are looking for.

    Something like

    curl -Ls -o /dev/null -w %{url_effective} http://google.com
    

    More info

    -L         Follow redirects
    -s         Silent mode. Don't output anything
    -o FILE    Write output to <file> instead of stdout
    -w FORMAT  What to output after completion
    

    More

    You might want to add -I (that is an uppercase i) as well, which will make the command not download any "body", but it then also uses the HEAD method, which is not what the question included and risk changing what the server does. Sometimes servers don't respond well to HEAD even when they respond fine to GET.

    0 讨论(0)
  • 2020-12-04 08:06

    curl can only follow http redirects. To also follow meta refresh directives and javascript redirects, you need a full-blown browser like headless chrome:

    #!/bin/bash
    real_url () {
        printf 'location.href\nquit\n' | \
        chromium-browser --headless --disable-gpu --disable-software-rasterizer \
        --disable-dev-shm-usage --no-sandbox --repl "$@" 2> /dev/null \
        | tr -d '>>> ' | jq -r '.result.value'
    }
    

    If you don't have chrome installed, you can use it from a docker container:

    #!/bin/bash
    real_url () {
        printf 'location.href\nquit\n' | \
        docker run -i --rm --user "$(id -u "$USER")" --volume "$(pwd)":/usr/src/app \
        zenika/alpine-chrome --no-sandbox --repl "$@" 2> /dev/null \
        | tr -d '>>> ' | jq -r '.result.value'
    }
    

    Like so:

    $ real_url http://dx.doi.org/10.1016/j.pgeola.2020.06.005 
    https://www.sciencedirect.com/science/article/abs/pii/S0016787820300638?via%3Dihub
    
    0 讨论(0)
提交回复
热议问题