Extract PDF text by coordinates

前端 未结 6 760
半阙折子戏
半阙折子戏 2021-02-04 20:03

I\'d like to know if there\'s some PDF library in Microsoft .NET being able of extracting text by giving coordinates.

For example (in pseudo-code):

<         


        
6条回答
  •  -上瘾入骨i
    2021-02-04 20:48

    iText's RegionTextRenderFilter is precisely what you're looking for.

    So you want something like this (forgive my Java, but it should be trivial to translate):

    PdfReader reader = new PdfReader(path);
    
    FilteredTextExtractionStrategy regionFilter = 
      new FilteredTextExtractionStrategy( new SimpleTextExtrationStrategy, 
                                          new RegionTextRenderFilter( someRect ) );
    String regionText = PdfTextExtractor.getTextFromPage(reader, 0, regionFilter );
    

提交回复
热议问题