Least used delimiter character in normal text < ASCII 128

后端 未结 14 1384
执念已碎
执念已碎 2020-12-23 19:00

For coding reasons which would horrify you (I\'m too embarrassed to say), I need to store a number of text items in a single string.

I will delimit them using a char

相关标签:
14条回答
  • 2020-12-23 19:02

    Probably | or ^ or ~ you could also combine two characters

    0 讨论(0)
  • 2020-12-23 19:03

    Can you use a pipe symbol? That's usually the next most common delimiter after comma or tab delimited strings. It's unlikely most text would contain a pipe, and ord('|') returns 124 for me, so that seems to fit your requirements.

    0 讨论(0)
  • 2020-12-23 19:09

    This can be good or bad (usually bad) depending on the situation and language, but keep mind mind that you can always Base64 encode the whole thing. You then don't have to worry about escaping and unescaping various patterns on each side, and you can simply seperate and split strings based on a character which isn't used in your Base64 charset.

    I have had to resort to this solution when faced with putting XML documents into XML properties/nodes. Properties can't have CDATA blocks in them at all, and nodes escaped as CDATA obviously cannot have further CDATA blocks inside that without breaking the structure.

    CSV is probably a better idea for most situations, though.

    0 讨论(0)
  • 2020-12-23 19:11

    For fast escaping I use stuff like this: say you want to concatinate str1, str2 and str3 what I do is:

    delimitedStr=str1.Replace("@","@a").Replace("|","@p")+"|"+str2.Replace("@","@a").Replace("|","@p")+"|"+str3.Replace("@","@a").Replace("|","@p");
    

    then to retrieve original use:

    splitStr=delimitedStr.Split("|".ToCharArray());
    str1=splitStr[0].Replace("@p","|").Replace("@a","@");
    str2=splitStr[1].Replace("@p","|").Replace("@a","@");
    str3=splitStr[2].Replace("@p","|").Replace("@a","@");
    

    note: the order of the replace is important

    its unbreakable and easy to implement

    0 讨论(0)
  • 2020-12-23 19:13

    You said "printable", but that can include characters such as a tab (0x09) or form feed (0x0c). I almost always choose tabs rather than commas for delimited files, since commas can sometimes appear in text.

    (Interestingly enough the ascii table has characters GS (0x1D), RS (0x1E), and US (0x1F) for group, record, and unit separators, whatever those are/were.)

    If by "printable" you mean a character that a user could recognize and easily type in, I would go for the pipe | symbol first, with a few other weird characters (@ or ~ or ^ or \, or backtick which I can't seem to enter here) as a possibility. These characters +=!$%&*()-'":;<>,.?/ seem like they would be more likely to occur in user input. As for underscore _ and hash # and the brackets {}[] I don't know.

    0 讨论(0)
  • 2020-12-23 19:15

    Pipe for the win! |

    0 讨论(0)
提交回复
热议问题