Vertical bar (|) Unicode replacement

We are using the vertical bar | (|) character as a field separator in one of our modules. so the users should not use this character in a title.

If they do use it, I would like to replace it with a similar character.

Is there a Unicode replacement for it? The only character that I have found and looks similar to it is broken vertical bar ¦ (¦).

I do not understand what you really need. Do you need to change the separator sequence to something guaranteed not to exist in the dataset?

If so, then that’s what Unicode’s 66 “non-character” code points are specifically designed for. You can use them as internal sentinels knowing that they cannot occur in valid data.

If you’re just looking for a visual lookalike, that’s very different. I would not suggest that, because there are lots of confusables. Here are just a few of those:

U+0007C ‭ |  GC=Sm SC=Common       VERTICAL LINE
U+000A6 ‭ ¦  GC=So SC=Common       BROKEN BAR
U+002C8 ‭ ˈ  GC=Lm SC=Common       MODIFIER LETTER VERTICAL LINE
U+002CC ‭ ˌ  GC=Lm SC=Common       MODIFIER LETTER LOW VERTICAL LINE
U+02016 ‭ ‖  GC=Po SC=Common       DOUBLE VERTICAL LINE
U+023D0 ‭ ⏐  GC=So SC=Common       VERTICAL LINE EXTENSION
U+02758 ‭ ❘  GC=So SC=Common       LIGHT VERTICAL BAR
U+02759 ‭ ❙  GC=So SC=Common       MEDIUM VERTICAL BAR
U+0275A ‭ ❚  GC=So SC=Common       HEAVY VERTICAL BAR
U+02AF4 ‭ ⫴  GC=Sm SC=Common       TRIPLE VERTICAL BAR BINARY RELATION
U+02AF5 ‭ ⫵  GC=Sm SC=Common       TRIPLE VERTICAL BAR WITH HORIZONTAL STROKE
U+02AFC ‭ ⫼  GC=Sm SC=Common       LARGE TRIPLE VERTICAL BAR OPERATOR
U+02AFE ‭ ⫾  GC=Sm SC=Common       WHITE VERTICAL BAR
U+02AFF ‭ ⫿  GC=Sm SC=Common       N-ARY WHITE VERTICAL BAR
U+0FF5C ‭ ｜ GC=Sm SC=Common       FULLWIDTH VERTICAL LINE
U+0FFE4 ‭ ￤ GC=So SC=Common       FULLWIDTH BROKEN BAR

user1254893

There's a 'light vertical bar' in Unicode: ❘, codepoint U+2758

http://www.fileformat.info/info/unicode/char/007c/index.htm

See Also:

latin letter dental click U+01C0
hebrew punctuation paseq U+05C0
divides U+2223
light vertical bar U+2758

Unicode, and indeed ASCII before it, has characters that are designed to be used for exactly your situation.

There are characters that are designed to be used as:

Unit Separator (␟): Between fields of a record, or members of a row.
Record Separators (␞): End of a record or row

Those characters you see are the visual representations:

␟ U+241F - Symbol for unit separator
␞: U+241E - Symbol for record separator

Now in reality you aren't supposed to use those characters. The actual characters go back to the ASCII days:

Character         Symbol  ASCII  Unicode  Unicode name
----------------  ------  -----  -------  -------------------------
Unit separator        ␟  0x0F   U+001F   Information separator one
Record separator      ␞  0x1E   U+001E   Information separator two

Unfortunately the actual record separator and unit separator characters are unprintable:

Field separator:
Record separator:

Which is why it is nice that the symbols exist for those characters:

Field separator: ␟
Record separator: ␞

And nothing stops you from using those characters themselves:

AUD␟Australian dollar␟0.923
BRL␟Brazilian real␟0.3443
CNY␟Chinese renminbi␟0.1926
EUR␟European euro␟1.5009
JPY␟Japanese yen␟0.01229
MXN␟Mexican peso␟0.06894
NOK␟Norwegian krone␟0.154
RUB␟Russian ruble␟0.02074
CHF␟Swiss franc␟1.3448
GBP␟UK pound sterling␟1.6844
VND␟Vietnamese dong␟0.000057

I know you said you wanted something visually similar. But:

stackoverflow is a wiki, where we add useful information
it's nice when there's an exact intended solution for a given problem

来源：https://stackoverflow.com/questions/10572627/vertical-bar-unicode-replacement

标签

xml

unicode

csv

separator