Most reliable split character

后端 未结 11 2010
忘掉有多难
忘掉有多难 2021-01-31 01:49

Update

If you were forced to use a single char on a split method, which char would be the most reliable?

Definition of reliable: a split charact

相关标签:
11条回答
  • 2021-01-31 02:43

    You can safely use whatever character you like as delimiter, if you escape the string so that you know that it doesn't contain that character.

    Let's for example choose the character 'a' as delimiter. (I intentionally picked a usual character to show that any character can be used.)

    Use the character 'b' as escape code. We replace any occurrence of 'a' with 'b1' and any occurrence of 'b' with 'b2':

    private static string Escape(string s) {
       return s.Replace("b", "b2").Replace("a", "b1");
    }
    

    Now, the string doesn't contain any 'a' characters, so you can put several of those strings together:

    string msg = Escape("banana") + "a" + Escape("aardvark") + "a" + Escape("bark");
    

    The string now looks like this:

    b2b1nb1nb1ab1b1rdvb1rkab2b1rk
    

    Now you can split the string on 'a' and get the individual parts:

    b2b1nb1nb1
    b1b1rdvb1rk
    b2b1rk
    

    To decode the parts you do the replacement backwards:

    private static string Unescape(string s) {
       return s.Replace("b1", "a").Replace("b2", "b");
    }
    

    So splitting the string and unencoding the parts is done like this:

    string[] parts = msg.split('a');
    for (int i = 0; i < parts.length; i++) {
      parts[i] = Unescape(parts[i]);
    }
    

    Or using LINQ:

    string[] parts = msg.Split('a').Select<string,string>(Unescape).ToArray();
    

    If you choose a less common character as delimiter, there are of course fewer occurrences that will be escaped. The point is that the method makes sure that the character is safe to use as delimiter without making any assumptions about what characters exists in the data that you want to put in the string.

    0 讨论(0)
  • 2021-01-31 02:43

    First of all, in C# (or .NET), you can use more than one split characters in one split operation.

    String.Split Method (Char[]) Reference here
    An array of Unicode characters that delimit the substrings in this instance, an empty array that contains no delimiters, or null reference (Nothing in Visual Basic).

    In my opinion, there's no MOST reliable split character, however some are more suitable than others.

    Popular split characters like tab, comma, pipe are good for viewing the un-splitted string/line.

    If it's only for storing/processing, the safer characters are probably those that are seldom used or those not easily entered from the keyboard.

    It also depend on the usage context. E.g. If you are expecting the data to contain email addresses, "@" is a no no.

    Say we were to pick one from the ASCII set. There are quite a number to choose from. E.g. " ` ", " ^ " and some of the non-printable characters. Do beware of some characters though, not all are suitable. E.g. 0x00 might have adverse effect on some system.

    0 讨论(0)
  • 2021-01-31 02:50

    I'd personally say that it depends on the situation entirely; if you're writing a simple TCP/IP chat system, you obviously shouldn't use '\n' as the split.. But '\0' is a good character to use due to the fact that the users can't ever use it!

    0 讨论(0)
  • 2021-01-31 02:51

    "|" pipe sign is mostly used when you are passing arguments.. to the method accepting just a string type parameter. This is widely used used in SQL Server SPs as well , where you need to pass an array as the parameter. Well mostly it depends upon the situation where you need it.

    0 讨论(0)
  • 2021-01-31 02:55

    \0 is a good split character. It's pretty hard (impossible?) to enter from keyboard and it makes logical sense.

    \n is another good candidate in some contexts.

    And of course, .Net strings are unicode, no need to limit yourself with the first 255. You can always use a rare Mongolian letter or some reserved or unused Unicode symbol.

    0 讨论(0)
提交回复
热议问题