Python to read pdf files

前端 未结 3 784
故里飘歌
故里飘歌 2021-02-19 23:02

I have found many posts where solutions to read pdfs has been proposed. I want to read a pdf file word by word and do some processing on it. people suggest pdfMiner which conver

3条回答
  •  悲&欢浪女
    2021-02-19 23:22

    Whereas I really liked the pdfminer answer I'd say that packages are not the same over time. Currenlty pdfminer still not support Python3 and may need to be updated. So, to update the subject -even if an answer have been already voted- I'd propose to go pdfrw, from the website :

    • Version 0.3 is tested and works on Python 2.6, 2.7, 3.3, 3.4, and 3.5 Operations include subsetting, merging, rotating, modifying metadata,etc
      • The fastest pure Python PDF parser available Has been used for years by a printer in pre-press production
      • Can be used with rst2pdf to faithfully reproduce vector images
      • Can be used either standalone, or in conjunction with reportlab to reuse existing PDFs in new ones
      • Permissively licensed

提交回复
热议问题