How to resolve URL redirects?

北城以北 提交于 2019-12-13 21:12:52

问题


I have a txt document with many short urls.Each url is seperated by a single line.I want to resolve the URLs to get the final link.Also some URLs are redirected twice.How to automate this to get the final urls with output format of one url per line? Update: Input text file:

http://www.example.com/go/post-page-1 
http://www.example.com/go/post-page-2 
http://www.example.com/go/post-page-3 

Output format needed in txt file:

http://www.example.org/post-page-name
http://www.example.org/post-page-name
http://www.example.org/post-page-name

Here is how the links are redirected:

Initial URL:http://www.example.com/go/post-page 
    ==>301 Permanent Redirect

Intermediate url:http://click.affiliate.com/tracking?url=http://www.example.org/post-page-name
==>302 Temporary Redirect

Final URL: http://www.example.org/post-page-name

Here is the code i tried but it doesn't resolve URLs to the final link but rather to the intermediate link.

#!/bin/bash
rm resolved_urls.txt
for url in $(cat url.txt); do
        wget -S "$url" 2>&1 | grep ^Location >> resolved_urls.txt
done

回答1:


So, it's not 100% clear on what you're asking for. But what I'm seeing, and what I'm guessing, I think this'll do it for you:

#! /bin/bash
# Use the urls.txt as your input file for wget
# Use the url-redirect.txt as your output file from wget.

wget -S -i urls.txt -o url-redirect.txt

# Grep for your "Final URL" output, extract the URL, assuming
#   the output you provided is what you're looking for, and is 
#   uniform, and redirect to your resolved_urls.txt file.

grep 'Final URL' url-redirect.txt | cut -d ' ' -f3>resolved_urls.txt

# Remove your trash temp file.
rm url-redirect.txt

This could probably be a lot faster without all the redirects, but I think this satisfies what you're looking for.




回答2:


Try something like this:

#!/bin/bash

function getFinalRedirect {
    local url=$1
    while true; do
        nextloc=$( curl -s -I $url | grep ^Location: )
        if [ -n "$nextloc" ]; then
            url=${nextloc##Location: }
        else
            break
        fi
    done

    echo $url
}

url="http://stackoverflow.com/q/25485374/1563512"
getFinalRedirect $url

Beware of infinite redirects. This produces:

$ ./test.bash 
http://stackoverflow.com/questions/25485374/how-to-resolve-url-redirects

Then, to call the function on your file:

while read url; do
    getFinalRedirect $url
done < urls.txt > finalurls.txt


来源:https://stackoverflow.com/questions/25485374/how-to-resolve-url-redirects

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!