How to mirror only a section of a website?

前端 未结 4 774
灰色年华
灰色年华 2020-12-22 15:27

I cannot get wget to mirror a section of a website (a folder path below root) - it only seems to work from the website homepage.

I\'ve tried many options - here is o

相关标签:
4条回答
  • 2020-12-22 16:09

    Use the --mirror (-m) and --no-parent (-np) options, plus a few of cool ones, like in this example:

    wget --mirror --page-requisites --adjust-extension --no-parent --convert-links
         --directory-prefix=sousers http://stackoverflow.com/users
    
    0 讨论(0)
  • 2020-12-22 16:15

    I usually use:

    wget -m -np -p $url
    
    0 讨论(0)
  • 2020-12-22 16:17

    Check out archivebox.io, it's an open-source, self-hosted tool that creates a local, static, browsable HTML clone of websites (it saves HTML, JS, media files, PDFs, screenshot, static assets and more).

    By default, it only archives the URL you specify, but we're adding a --depth=n flag soon that will let you recursively archive links from the given URL.

    0 讨论(0)
  • 2020-12-22 16:22

    I use pavuk to accomplish mirrors, as it seemed much better for this purpose just from the beginning. You can use something like this:

    /usr/bin/pavuk -enable_js -fnrules F '*.php?*' '%o.php' -tr_str_str '?' '_questionmark_' \
                   -norobots -dont_limit_inlines -dont_leave_dir \
                   http://www.example.com/some_directory/ >OUT 2>ERR
    
    0 讨论(0)
提交回复
热议问题