Removing hidden characters from within strings

前端 未结 8 1268
既然无缘
既然无缘 2020-12-01 10:37

My problem:

I have a .NET application that sends out newsletters via email. When the newsletters are viewed in outlook, outlook displays a question mark in place

相关标签:
8条回答
  • 2020-12-01 10:54

    You can remove all control characters from your input string with something like this:

    string input; // this is your input string
    string output = new string(input.Where(c => !char.IsControl(c)).ToArray());
    

    Here is the documentation for the IsControl() method.

    Or if you want to keep letters and digits only, you can also use the IsLetter and IsDigit function:

    string output = new string(input.Where(c => char.IsLetter(c) || char.IsDigit(c)).ToArray());
    
    0 讨论(0)
  • 2020-12-01 10:55

    I usually use this regular expression to replace all non-printable characters.

    By the way, most of the people think that tab, line feed and carriage return are non-printable characters, but for me they are not.

    So here is the expression:

    string output = Regex.Replace(input, @"[^\u0009\u000A\u000D\u0020-\u007E]", "*");
    
    • ^ means if it's any of the following:
    • \u0009 is tab
    • \u000A is linefeed
    • \u000D is carriage return
    • \u0020-\u007E means everything from space to ~ -- that is, everything in ASCII.

    See ASCII table if you want to make changes. Remember it would strip off every non-ASCII character.

    To test above you can create a string by yourself like this:

        string input = string.Empty;
    
        for (int i = 0; i < 255; i++)
        {
            input += (char)(i);
        }
    
    0 讨论(0)
  • 2020-12-01 11:05

    I used this quick and dirty oneliner to clean some input from LTR/RTL marks left over by the broken Windows 10 calculator app. It's probably a far cry from perfect but good enough for a quick fix:

    string cleaned = new string(input.Where(c => !char.IsControl(c) && (char.IsLetterOrDigit(c) || char.IsPunctuation(c) || char.IsSeparator(c) || char.IsSymbol(c) || char.IsWhiteSpace(c))).ToArray());
    
    0 讨论(0)
  • 2020-12-01 11:07

    What best worked for me is:

    string result = new string(value.Where(c =>  char.IsLetterOrDigit(c) || (c >= ' ' && c <= byte.MaxValue)).ToArray());
    

    Where I'm making sure the character is any letter or digit, so that I don't ignore any non English letters, or if it is not a letter I check whether it's an ascii character that is greater or equal than Space to make sure I ignore some control characters, this ensures I don't ignore punctuation.

    Some suggest using IsControl to check whether the character is non printable or not, but that ignores Left-To-Right mark for example.

    0 讨论(0)
  • 2020-12-01 11:08

    It has been a while but this haven't been answered yet.

    How do you include the HMTL content in the sending code? if you are reading it from file, check the file encoding. If you are using UTF-8 with signature (the name slightly varies between editors), this is may cause the weird char at the begining of the mail.

    0 讨论(0)
  • 2020-12-01 11:10

    You can do this:

    var hChars = new char[] {...};
    var result = new string(yourString.Where(c => !hChars.Contains(c)).ToArray());
    
    0 讨论(0)
提交回复
热议问题