UTF-8 string delimiter

前端 未结 3 1271
悲&欢浪女
悲&欢浪女 2021-01-21 00:30

I am parsing a binary protocol which has UTF-8 strings interspersed among raw bytes. This particular protocol prefaces each UTF-8 string with a short (two bytes) indicating the

相关标签:
3条回答
  • 2021-01-21 00:52

    i would use a delimiter which starts with 0x11...... but if you send raw bytes you will have to exclude this delimiter from the data\messages processed ,this means that if there is a user input similar to that delimiter, you will have to convert it.

    if the user inputs any utf8 represented char you may simply send it as is.

    0 讨论(0)
  • 2021-01-21 01:04

    UTF8 is not normally de-limited, you should be able to spot the multibyte characters in there by using the rules mentioned here: http://en.wikipedia.org/wiki/UTF-8#Description

    0 讨论(0)
  • 2021-01-21 01:08

    I wouldn't call that delimiting, more like "length prefixing". Some people call them Pascal strings since in the early days the language Pascal was one of the popular ones that stored strings that way in memory.

    I don't think there's a formal standard specifically for just that, as it's a rather obvious way of storing UTF-8 strings (or any strings of bytes for that matter). It's defined over and over as a part of many standards that deal with messages that contain strings, though.

    0 讨论(0)
提交回复
热议问题