问题
I have a txt document with many short urls.Each url is seperated by a single line.I want to resolve the URLs to get the final link.Also some URLs are redirected twice.How to automate this to get the final urls with output format of one url per line? Update: Input text file:
http://www.example.com/go/post-page-1
http://www.example.com/go/post-page-2
http://www.example.com/go/post-page-3
Output format needed in txt file:
http://www.example.org/post-page-name
http://www.example.org/post-page-name
http://www.example.org/post-page-name
Here is how the links are redirected:
Initial URL:http://www.example.com/go/post-page
==>301 Permanent Redirect
Intermediate url:http://click.affiliate.com/tracking?url=http://www.example.org/post-page-name
==>302 Temporary Redirect
Final URL: http://www.example.org/post-page-name
Here is the code i tried but it doesn't resolve URLs to the final link but rather to the intermediate link.
#!/bin/bash
rm resolved_urls.txt
for url in $(cat url.txt); do
wget -S "$url" 2>&1 | grep ^Location >> resolved_urls.txt
done
回答1:
So, it's not 100% clear on what you're asking for. But what I'm seeing, and what I'm guessing, I think this'll do it for you:
#! /bin/bash
# Use the urls.txt as your input file for wget
# Use the url-redirect.txt as your output file from wget.
wget -S -i urls.txt -o url-redirect.txt
# Grep for your "Final URL" output, extract the URL, assuming
# the output you provided is what you're looking for, and is
# uniform, and redirect to your resolved_urls.txt file.
grep 'Final URL' url-redirect.txt | cut -d ' ' -f3>resolved_urls.txt
# Remove your trash temp file.
rm url-redirect.txt
This could probably be a lot faster without all the redirects, but I think this satisfies what you're looking for.
回答2:
Try something like this:
#!/bin/bash
function getFinalRedirect {
local url=$1
while true; do
nextloc=$( curl -s -I $url | grep ^Location: )
if [ -n "$nextloc" ]; then
url=${nextloc##Location: }
else
break
fi
done
echo $url
}
url="http://stackoverflow.com/q/25485374/1563512"
getFinalRedirect $url
Beware of infinite redirects. This produces:
$ ./test.bash
http://stackoverflow.com/questions/25485374/how-to-resolve-url-redirects
Then, to call the function on your file:
while read url; do
getFinalRedirect $url
done < urls.txt > finalurls.txt
来源:https://stackoverflow.com/questions/25485374/how-to-resolve-url-redirects