I would like to save a web page programmatically.
I don\'t mean merely save the HTML. I would also like automatically to store all associated files (images, CSS file
Take a look at wget, specifically the -p flag
−p −−page−requisites
This option causes Wget to download all the files
that are necessary to properly display
a givenHTML page. Thisincludes such
things as inlined images, sounds, and
referenced stylesheets.
The following command:
wget -p http://<site>/1.html
Will download page.html and all files it requires.
You could try the MHTML format (which is what IE uses). http://en.wikipedia.org/wiki/MHTML
In other words, you'd be downloading each object (image, css, etc.) to your computer, and then "embedding" them, via Base64, into a single file.
On Windows: you can run IE as a com object and pull everything out.
On other thing, you can take the source of Mozilla.
In Java, Lobo.
Or commons-httpclient and write a lot of code.