How to convert PDF to HTML?

蓝咒 提交于 2019-11-30 03:43:47

Like I mentioned in the comment above, it is definitely possible to convert pdf to html using the tool Able2Extract7 which can be downloaded from here

I have been using this tool for almost 2 years now and I am pretty happy with it. This tool lets you convert PDF to Word, Excel, PowerPoint, Publisher, HTML, OO etc. See screenshot

Imp Note: This tool is not a freeware.

HTH

moof2k

If you're on Linux, try pdftohtml:

sudo apt-get install poppler-utils
pdftohtml -enc UTF-8 -noframes infile.pdf outfile.html

The open source ebook converter Calibre can also convert PDF files to HTML and is available on MacOS, Windows and Linux.

It is technically impossible to simply "convert" a PDF file to HTML. The PDF format is more like a "canvas", where you "place" your text blocks and images, whereas HTML needs either CSS or a lot of tables to "place" the blocks. Moreover, PDF files embed the images, whereas HTML simply calls other files.
There are many other examples of differences, but essentially, it's like asking to convert an image or a video with text in it.

You can however read from a PDF file, and then extract the text and images from it, using libraries or other advanced techniques. .Net has a few libraries, for instance : http://forums.asp.net/post/2167442.aspx

If you only need to convert one file once, you can open the pdf file in Illustrator for instance, and then export that in html. Or you can select all the document (ctrl+a), copy it, and paste it in Word, and then save the result in html. It will be far from perfect, but it will be a start.

Sergio Muriel

Download

  • pdfbox-2.0.3.jar
  • fontbox-2.0.3.jar
  • preflight-2.0.3.jar
  • xmpbox-2.0.3.jar
  • pdfbox-tools-2.0.3.jar
  • pdfbox-debugger-2.0.3.jar

from http://pdfbox.apache.org/

 import java.io.InputStream;
 import java.io.IOException;
 import org.apache.pdfbox.pdmodel.PDDocument;
 import org.apache.pdfbox.tools.PDFText2HTML;

    // .....
    try {
        InputStream is = // ..... Read PDF file
        PDDocument pdd = PDDocument.load(is); //This is the in-memory representation of the PDF document.
        PDFText2HTML converter = new PDFText2HTML(); // the converter
        String html = converter.getText(pdd); // That's it!
        pdd.close();
        is.close();
    } catch (IOException ioe) {
        // ......
    }

Please note: Images do not get pushed to the HTML output.

Kjk

It's not that difficult to convert PDF to HTML. There are many online options, which may, however, expose your data to third parties. Follow these steps, and the output is great.

  1. Open the PDF2HTMLEX page. (You can either follow to next steps which i have mentioned, or follow the directions from the page.)

  2. The package is available for download for Windows from here.

    From the many options available, I recommend downloading "pdf2htmlEX-win32-0.14.6-upx-with-poppler-data.zip (pdf2htmlEx.exe is packed with UPX)"

  3. After downloading and un-zipping conversion is just one cmd command away.

    C:\Users\kjk\Downloads\pdf2htmlEX-win32-0.14.6-upx-with-poppler-data>pdf2htmlEX.exe c:\1\abc.pdf
    

    Final Command:

    pdf2htmlEX.exe c:\1\abc.pdf
    

    (You can of course shorten the name of the folder, however, I kept it the same as you would see after un-zipping the download. I am assuming you can change the directory in cmd to the desired folder or else Google how.)

abc.pdf will be converted to HTML and will be saved as abc.html in the same folder as that of your exe.

Not sure that it can be helpful, but if you need one-time conversion you can try this free on-line tool: https://www.readkong.com/

Used this site several times. It produces html that is identical to pdf original source. No ugly and broken markup, no html mashup and so on, even for very complex pdf.

Yeah it definitely is possible. If your on ubuntu linux

apt-get install htmltopdf

then

htmltopdf myFile.pdf myFile.htm -c -noframes

If you want to see what all the flags mean then just type

htmltopdf

If your not on linux, there are a plethora of tools out there that you can use to make this happen.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!