Python to read pdf files

前端 未结 3 793
故里飘歌
故里飘歌 2021-02-19 23:02

I have found many posts where solutions to read pdfs has been proposed. I want to read a pdf file word by word and do some processing on it. people suggest pdfMiner which conver

3条回答
  •  旧时难觅i
    2021-02-19 23:28

    Possibly the fastest way to do this is to first convert your pdf inta a text file using pdftotext (on pdfMiner's site, there's a statement that pdfMiner is 20 times slower than pdftotext) and afterwards parse the text file as usual.

    Also, when you said "I want to read a pdf file word by word and do some processing on it", you didn't specify if you want to do processing based on words in a pdf file, or do you actually want to modify the pdf file itself. If it's the second case, then you've got an entirely different problem on your hands.

提交回复
热议问题