Reading PDF per Line

前端 未结 4 946
南方客
南方客 2021-01-13 09:20

How can I read a PDF file line by line using iText5 for .NET? I have search through the internet but I only found reading PDF file per page content.

Ple

相关标签:
4条回答
  • 2021-01-13 09:47

    Try this, use theLocationTextExtractionStrategy instead of the SimpleTextExtractionStrategy it will add new line characters to the text returned. Then you can use strText.Split('\n') to split your text into a string[] and consume it on a per line basis.

    0 讨论(0)
  • 2021-01-13 09:48

    If you make a eBook reader for PDF, either just show as what PDF is, same look as other pdf ready does. Or read the text out and reformat yourself.

    I prefer the second method, just format the text whatever nice since if I use the ebook reader, I just care the content and never care about what it should look like

    0 讨论(0)
  • 2021-01-13 09:53

    You can find here the PDF2Text Pilot licensed under BSD Open-Sourse software.

    Despite that it's written in c++, it may serve as an an inspiring good start toward solving your problem.

    I'm not proficient in C# but I think there might be some hope on the interoperability side ?

    0 讨论(0)
  • 2021-01-13 09:56

    I worked for a eBook reading company and PDFs, we spent a lot of time and effort trying to get the reading order of text, since the reader could read to you ... bouncing dot ... PDFs do not have to have line by line sequence. Books also have lots of elements that are not in reading order including page number, references, captions, examples, multi-column, etc.. It's a hard problem. PDF is basically a print format at its heart.

    0 讨论(0)
提交回复
热议问题