Using tesseract in r to extract text from pdfs I seek to split or segment the text during the ocr process. Specifically I seek to isolate document headers to ease further te