What's the difference between utf8_general_ci and utf8_unicode_ci?

后端未结

关注

 8  1210

暗喜 2020-11-22 01:38

Between utf8_general_ci and utf8_unicode_ci, are there any differences in terms of performance?

8条回答

长发绾君心 (楼主)

2020-11-22 02:03
I wanted to know what is the performance difference between using utf8_general_ci and utf8_unicode_ci, but I did not find any benchmarks listed on the internet, so I decided to create benchmarks myself.

I created a very simple table with 500,000 rows:
```
CREATE TABLE test(
  ID INT(11) DEFAULT NULL,
  Description VARCHAR(20) DEFAULT NULL
)
ENGINE = INNODB
CHARACTER SET utf8
COLLATE utf8_general_ci;
```
Then I filled it with random data by running this stored procedure:
```
CREATE PROCEDURE randomizer()
BEGIN
  DECLARE i INT DEFAULT 0;
  DECLARE random CHAR(20) ;
  theloop: loop
    SET random = CONV(FLOOR(RAND() * 99999999999999), 20, 36);
    INSERT INTO test VALUES (i+1, random);
    SET i=i+1;
    IF i = 500000 THEN
      LEAVE theloop;
    END IF;
  END LOOP theloop;
END
```
Then I created the following stored procedures to benchmark simple SELECT, SELECT with LIKE, and sorting (SELECT with ORDER BY):
```
CREATE PROCEDURE benchmark_simple_select()
BEGIN
  DECLARE i INT DEFAULT 0;
  theloop: loop
    SELECT *
    FROM test
    WHERE Description = 'test' COLLATE utf8_general_ci;
    SET i = i + 1;
    IF i = 30 THEN
      LEAVE theloop;
    END IF;
  END LOOP theloop;
END;

CREATE PROCEDURE benchmark_select_like()
BEGIN
  DECLARE i INT DEFAULT 0;
  theloop: loop
    SELECT *
    FROM test
    WHERE Description LIKE '%test' COLLATE utf8_general_ci;
    SET i = i + 1;
    IF i = 30 THEN
      LEAVE theloop;
    END IF;
  END LOOP theloop;
END;

CREATE PROCEDURE benchmark_order_by()
BEGIN
  DECLARE i INT DEFAULT 0;
  theloop: loop
    SELECT *
    FROM test
    WHERE ID > FLOOR(1 + RAND() * (400000 - 1))
    ORDER BY Description COLLATE utf8_general_ci LIMIT 1000;
    SET i = i + 1;
    IF i = 10 THEN
      LEAVE theloop;
    END IF;
  END LOOP theloop;
END;
```
In the stored procedures above utf8_general_ci collation is used, but of course during the tests I used both utf8_general_ci and utf8_unicode_ci.

I called each stored procedure 5 times for each collation (5 times for utf8_general_ci and 5 times for utf8_unicode_ci) and then calculated the average values.

My results are:

benchmark_simple_select()
- with utf8_general_ci: 9,957 ms
- with utf8_unicode_ci: 10,271 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 3.2%.

benchmark_select_like()
- with utf8_general_ci: 11,441 ms
- with utf8_unicode_ci: 12,811 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 12%.

benchmark_order_by()
- with utf8_general_ci: 11,944 ms
- with utf8_unicode_ci: 12,887 ms
In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 7.9%.
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...