问题
I would like to know how to crawl data inside a pdf file using scrapy. Which module should I use and which is the best and effective way?? Could you please give me some sample tutorials on this
Thanks!!
回答1:
I suggest you get the PDF with Scrapy and use PyPDF2 to get the content inside the PDF.
For a complete but somewhat old (using pyPDF) example take a look at this site.
来源:https://stackoverflow.com/questions/31288217/scrapy-crawl-data-inside-pdf-file