How do I save a web page, programmatically?

前端 未结 3 1023
予麋鹿
予麋鹿 2021-01-05 08:48

I would like to save a web page programmatically.

I don\'t mean merely save the HTML. I would also like automatically to store all associated files (images, CSS file

相关标签:
3条回答
  • 2021-01-05 08:58

    Take a look at wget, specifically the -p flag

    −p  −−page−requisites
    This option causes Wget to download all the files
    that are necessary to properly display
    a givenHTML  page. Thisincludes such
    things as inlined images, sounds, and
    referenced stylesheets.
    

    The following command:

    wget -p http://<site>/1.html
    

    Will download page.html and all files it requires.

    0 讨论(0)
  • 2021-01-05 09:04

    You could try the MHTML format (which is what IE uses). http://en.wikipedia.org/wiki/MHTML

    In other words, you'd be downloading each object (image, css, etc.) to your computer, and then "embedding" them, via Base64, into a single file.

    0 讨论(0)
  • 2021-01-05 09:07

    On Windows: you can run IE as a com object and pull everything out.

    On other thing, you can take the source of Mozilla.

    In Java, Lobo.

    Or commons-httpclient and write a lot of code.

    0 讨论(0)
提交回复
热议问题