extracting content from pdf using PHP

后端 未结 2 1856
挽巷
挽巷 2021-01-01 01:55

Could you please tell me how to extract content from PDF document using PHP? Formatting is the main problem im facing here. So let me know, if there are some ways to extract

2条回答
  •  挽巷
    挽巷 (楼主)
    2021-01-01 02:48

    As far as I can see, it is not possible to convert a PDF to editable HTML using PHP on the fly, while preserving formatting. There are a number of Desktop apps around that all try to extract data from PDFs with sometimes more, sometimes less reliable results. I would say this is not realistically possible at the moment and all you can do is to extract plain text using XPDF or other command line tools.

    It may be different with that new XML-Based PDF format but I don't really know anything about that yet.

    Feel free to prove me wrong, of course - I'd be very interested myself if there were a solution.

提交回复
热议问题