Copy+pasting text from PDF results in garbage

后端 未结 7 2474
无人及你
无人及你 2021-02-20 00:37

I am writing a Master\'s thesis - NLP system. I have one component - extractor.

It is extracting a plain text from PDF files. There are a few PDF files that can not be

7条回答
  •  借酒劲吻你
    2021-02-20 01:29

    What was the PDF created with. Some PDFs do not contain any encoding information, just the data to draw it. So there is no way to extract the data.

提交回复
热议问题