Extract PDF text by coordinates

前端 未结 6 759
半阙折子戏
半阙折子戏 2021-02-04 20:03

I\'d like to know if there\'s some PDF library in Microsoft .NET being able of extracting text by giving coordinates.

For example (in pseudo-code):

<         


        
6条回答
  •  我在风中等你
    2021-02-04 20:43

    This code will work in itext 7

    PdfReader reader = new PdfReader("D:/Sample2.pdf");
    PdfDocument pdfDoc = new PdfDocument(reader);
    Rectangle rect = new Rectangle(208, 508, 235, 519);
    TextRegionEventFilter regionFilter = new 
    TextRegionEventFilter(rect.SetBbox(208, 508, 235, 519));
    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
    FilteredEventListener listener = new FilteredEventListener();
    LocationTextExtractionStrategy extractionStrategy = listener.AttachEventListener(new LocationTextExtractionStrategy(), regionFilter);
    new PdfCanvasProcessor(listener).ProcessPageContent(pdfDoc.GetPage(1));
    String text = extractionStrategy.GetResultantText();
    

提交回复
热议问题