How to remove OCR artifacts from text?
问题 OCR generated texts sometimes come with artifacts, such as this one: Diese grundsätzliche V e r b o r g e n h e i t Gottes, die sich n u r dem N a c h f o l g e r ö f f n e t , ist m i t d e m Messiasgeheimnis gemeint While it is not unusual, that the spacing between letters is used as emphasis (probably due to early printing press limitations), it is unfavorable for retrieval tasks. How can one turn the above text into a more, say, canonical form, like: Diese grundsätzliche Verborgenheit