Unicode (hexadecimal) character literals in MySQL

前端 未结 5 1618
遇见更好的自我
遇见更好的自我 2021-01-05 15:21

Is there a way to specify Unicode character literals in MySQL?

I want to replace a Unicode character with an Ascii character, something like the following:



        
相关标签:
5条回答
  • 2021-01-05 16:02

    You can specify hexadecimal literals (or even binary literals) using 0x, x'', or X'':

    select  0xC2A2;
    select x'C2A2';
    select X'C2A2';
    

    But be aware that the return type is a binary string, so each and every byte is considered a character. You can verify this with char_length:

    select char_length(0xC2A2)
    

    2

    If you want UTF-8 strings instead, you need to use convert:

    select convert(0xC2A2 using utf8mb4)
    

    And we can see that C2 A2 is considered 1 character in UTF-8:

    select char_length(convert(0xC2A2 using utf8mb4))
    

    1


    Also, you don't have to worry about invalid bytes because convert will remove them automatically:

    select char_length(convert(0xC1A2 using utf8mb4))
    

    0

    As can be seen, the output is 0 because C1 A2 is an invalid UTF-8 byte sequence.

    0 讨论(0)
  • 2021-01-05 16:07

    There is also the char function that will allow what you wanted (providing byte numbers and a charset name) and getting a char.

    0 讨论(0)
  • 2021-01-05 16:11

    You can use the hex and unhex functions, e.g.:

    update mytable set myfield = unhex(replace(hex(myfield),'C383','C3'))
    
    0 讨论(0)
  • 2021-01-05 16:20

    Thanks for your suggestions, but I think the problem was further back in the system.

    There's a lot of levels to unpick, but as far as I can tell, (on this server at least) the command

    set names utf8
    

    makes the utf-8 handling work correctly, whereas

    set character set utf8
    

    doesn't.

    In my environment, these are being called from PHP using PDO, for what difference that may make.

    Thanks anyway!

    0 讨论(0)
  • 2021-01-05 16:24

    The MySQL string syntax is specified here, as you can see, there is no provision for numeric escape sequences.

    However, as you are embedding the SQL in PHP, you can compute the right bytes in PHP. Make sure the bytes you put into the SQL actually match your client character set.

    0 讨论(0)
提交回复
热议问题