MySQL : strange LENGTH() behaviour on utf8 string

偶尔善良 提交于 2021-02-08 12:18:52

问题


I am doing unit tests on requests generators, and I get in trouble with LENGTH function.

I have 2 requests that follows each other :

SHOW VARIABLES LIKE '%character%'

Returns the following result :

array(8) {
  [0] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_client"
    'Value' =>
    string(4) "utf8"
  }
  [1] =>
  array(2) {
    'Variable_name' =>
    string(24) "character_set_connection"
    'Value' =>
    string(4) "utf8"
  }
  [2] =>
  array(2) {
    'Variable_name' =>
    string(22) "character_set_database"
    'Value' =>
    string(6) "latin1"
  }
  [3] =>
  array(2) {
    'Variable_name' =>
    string(24) "character_set_filesystem"
    'Value' =>
    string(6) "binary"
  }
  [4] =>
  array(2) {
    'Variable_name' =>
    string(21) "character_set_results"
    'Value' =>
    string(4) "utf8"
  }
  [5] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_server"
    'Value' =>
    string(4) "utf8"
  }
  [6] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_system"
    'Value' =>
    string(4) "utf8"
  }
  [7] =>
  array(2) {
    'Variable_name' =>
    string(18) "character_sets_dir"
    'Value' =>
    string(26) "/usr/share/mysql/charsets/"
  }
}

My second request is :

SELECT LENGTH('重庆') as len

It returns 6 instead of 2.

What's wrong here ? My charset parameters looks good.


回答1:


I found my answer in the MySQL documentation :

The LENGTH function counts bytes :

mysql> SELECT LENGTH('重庆') ;
+------------------+
| LENGTH('重庆')   |
+------------------+
|                6 |
+------------------+
1 row in set (0.00 sec)

The CHAR_LENGTH function counts characters :

mysql> SELECT CHAR_LENGTH('重庆') ;
+-----------------------+
| CHAR_LENGTH('重庆')   |
+-----------------------+
|                     2 |
+-----------------------+
1 row in set (0.00 sec)



回答2:


They both work completely different:

Once LENGTH() returns always the length of the string by bytes. CHAR_LENGTH() is gonna return the length of the string by characters.

Once you are using Unicode, in which most characters are encoded in two bytes, It is always gonna be different. Or even when we are talking about UTF-8, where the number of bytes varies all the time.

e.g.:

SELECT LENGTH('重庆'), CHAR_LENGTH('重庆');
-->   6,  2  


来源:https://stackoverflow.com/questions/16278898/mysql-strange-length-behaviour-on-utf8-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!