How do I prevent wget from following redirects?
Use curl
without -L
instead of wget
. Omitting that option when using curl
prevents the redirect from being followed.
If you use curl -I <URL>
then you'll get the headers instead of the redirect HTML.
If you use curl -IL <URL>
then you'll get the headers for the URL, plus those for the URL you're redirected to.
Some versions of wget
have a --max-redirect
option: See here
--max-redirect 0
I haven't tried this, it will either allow none or allow infinite..
In general, it is not a good idea to depend on a specific number of redirects.
For example, in order to download IntellijIdea, the URL that is promised to always resolve to the latest version of Community Edition for Linux is something like https://download.jetbrains.com/product?code=IIC&latest&distribution=linux
, but if you visit that URL nowadays, you are going to be redirected twice (2 times) before you reach the actual downloadable file. In the future you might be redirected three times, or not at all.
The way to solve this problem is with the use of the HTTP HEAD verb. Here is how I solved it in the case of IntellijIdea:
# This is the starting URL.
URL="https://download.jetbrains.com/product?code=IIC&latest&distribution=linux"
echo "URL: $URL"
# Issue HEAD requests until the actual target is found.
# The result contains the target location, among some irrelevant stuff.
LOC=$(wget --no-verbose --method=HEAD --output-file - $URL)
echo "LOC: $LOC"
# Extract the URL from the result, stripping the irrelevant stuff.
URL=$(cut "--delimiter= " --fields=4 <<< "$LOC")
echo "URL: $URL"
# Optional: download the actual file.
wget "$URL"
wget follows up to 20 redirects by default. However, it does not span hosts. If you have asked wget to download example.com
, it will not touch any resources at www.example.com
. wget will detect this as a request to span to another host and decide against it.
In short, you should probably be executing:
wget --mirror www.example.com
Rather than
wget --mirror example.com
Now let's say the owner of www.example.com
has several subdomains at example.com
and we are interested in all of them. How to proceed?
Try this:
wget --mirror --domains=example.com example.com
wget will now visit all subdomains of example.com, including m.example.com
and www.example.com
.