I am writing a Master\'s thesis - NLP system. I have one component - extractor.
It is extracting a plain text from PDF files. There are a few PDF files that can not be
What was the PDF created with. Some PDFs do not contain any encoding information, just the data to draw it. So there is no way to extract the data.