How to remove control chars from UTF8 string

后端 未结 2 1196
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-02-10 12:29

i have a VB.NET program that handles the content of documents. The programm handles high volumes of documents as \"batch\"(>2Million documents;total 1TB volume) Some of this doc

相关标签:
2条回答
  • 2021-02-10 13:04

    Here is the POSIX regex for control characters: [:cntrl:], from Regular Expression on Wikipedia.

    0 讨论(0)
  • 2021-02-10 13:08

    Try

    resultString = Regex.Replace(subjectString, "\p{C}+", "");
    

    This will remove all "other" Unicode characters (control, format, private use, surrogate, and unassigned) from your string.

    0 讨论(0)
提交回复
热议问题