I want to be able to save / archive HTML pages as one file (without those pesky external folders).
I want the resulting file to contain all styles, images, and links (videos and Flash would be nice, too, but not as crucial).
I want the resulting file to be searchable, and editable.
Microsoft's MHT is one of such tools, but unfortunately, it's not searchable under Linux. MHT is good, but I don't want to be locked under one operating system or one company. What would be a good alternative – or perhaps there's some entirely different solution I wasn't thinking about?
Thank you in advance for your suggestions!
Have you tried Googling it? Third link down. http://cybernetnews.com/save-webpage-single-html-file/
Viewing and creating MHTML files in current versions of Google Chrome is supported by toggling the "Save Page as MHTML" option on the chrome://flags page.
type chrome://flags in your url box
However, enabling this experimental option disables saving pages as HTML-only or HTML Complete files. From the chrome://flags page:
The SingleFile chrome extension is a good solution.
I have also written my own python tool to solve this problem which I would recommend giving a try: https://github.com/zTrix/webpage2html
Extending upon zTrix's answer, I would suggest avoiding the Chrome extension (which did not work for me at all) and instead going with one of these options:
- Node.js: remy's inliner
- Easy to install using
npm
- Many options, including flags for disabling minification/compression, maintaining external images, skipping videos, and more.
- Caveat: (22 September 2017) fails to maintain styling and JavaScript functionality when compiling Slate builds. This won't affect most people directly, but it means that inliner will probably have issues with other pages. See this issue
- Caveat: no options to "leave things alone": will either minify/uglify CSS/JS or beautify, but will not simply embed original source into HTML.
- Easy to install using
- Python 2: zTrix's webpage2html
- More conservative than inliner; works well for most cases.
- zTrix fixed a bug (that inliner also seems to have) which ensures JavaScript/CSS functionality when compiling Slate builds. See this issue. (updated 29 September 2017)
- Can be converted to Python 3 relatively painlessly
- Caveat: cannot handle CSS
@import
Usually, it's possible to create one HTML file that contains all his common children files (css, jpg, js, svg, ...)
You must rewrite the HTML file by replacing "src
" attributes' value, "url()
" functions and insert HTML tag like "<script></script>
" for JavaScript files, "<style></style>
" for CSS files and "<svg></svg>
" for SVG image.
For example a GIF image file in CSS called by the "url()
" function.
- download the image from his URL.
- encode this image into Base64.
- replace "
url('https://en.wikipedia.org/wiki/File:TPB_Magnet_Icon.gif')
" by "url('data:image/gif;base64,R0lGODlhDAAMALMPAOXl5ewvErW1tebm5oocDkVFRePj47a2ts0WAOTk5MwVAIkcDesuEs0VAEZGRv///yH5BAEAAA8ALAAAAAAMAAwAAARB8MnnqpuzroZYzQvSNMroUeFIjornbK1mVkRzUgQSyPfbFi/dBRdzCAyJoTFhcBQOiYHyAABUDsiCxAFNWj6UbwQAOw')
" with the Base64 encoded GIF image, prefixed by "data:image/gif;base64,
"
You can do the same thing for the "src
" attribute's value.
This solution may be used for other binary files. You must adapt the right "data
" prefix to corresponding to the encoded object.
来源:https://stackoverflow.com/questions/16169744/how-to-save-html-pages-as-one-file