I cannot get wget to mirror a section of a website (a folder path below root) - it only seems to work from the website homepage.
I\'ve tried many options - here is o
Use the --mirror
(-m
) and --no-parent
(-np
) options, plus a few of cool ones, like in this example:
wget --mirror --page-requisites --adjust-extension --no-parent --convert-links
--directory-prefix=sousers http://stackoverflow.com/users
I usually use:
wget -m -np -p $url
Check out archivebox.io, it's an open-source, self-hosted tool that creates a local, static, browsable HTML clone of websites (it saves HTML, JS, media files, PDFs, screenshot, static assets and more).
By default, it only archives the URL you specify, but we're adding a --depth=n
flag soon that will let you recursively archive links from the given URL.
I use pavuk to accomplish mirrors, as it seemed much better for this purpose just from the beginning. You can use something like this:
/usr/bin/pavuk -enable_js -fnrules F '*.php?*' '%o.php' -tr_str_str '?' '_questionmark_' \
-norobots -dont_limit_inlines -dont_leave_dir \
http://www.example.com/some_directory/ >OUT 2>ERR