What's the difference between utf8_general_ci and utf8_unicode_ci?

后端 未结 8 1210
暗喜
暗喜 2020-11-22 01:38

Between utf8_general_ci and utf8_unicode_ci, are there any differences in terms of performance?

8条回答
  •  长发绾君心
    2020-11-22 02:03

    I wanted to know what is the performance difference between using utf8_general_ci and utf8_unicode_ci, but I did not find any benchmarks listed on the internet, so I decided to create benchmarks myself.

    I created a very simple table with 500,000 rows:

    CREATE TABLE test(
      ID INT(11) DEFAULT NULL,
      Description VARCHAR(20) DEFAULT NULL
    )
    ENGINE = INNODB
    CHARACTER SET utf8
    COLLATE utf8_general_ci;
    

    Then I filled it with random data by running this stored procedure:

    CREATE PROCEDURE randomizer()
    BEGIN
      DECLARE i INT DEFAULT 0;
      DECLARE random CHAR(20) ;
      theloop: loop
        SET random = CONV(FLOOR(RAND() * 99999999999999), 20, 36);
        INSERT INTO test VALUES (i+1, random);
        SET i=i+1;
        IF i = 500000 THEN
          LEAVE theloop;
        END IF;
      END LOOP theloop;
    END
    

    Then I created the following stored procedures to benchmark simple SELECT, SELECT with LIKE, and sorting (SELECT with ORDER BY):

    CREATE PROCEDURE benchmark_simple_select()
    BEGIN
      DECLARE i INT DEFAULT 0;
      theloop: loop
        SELECT *
        FROM test
        WHERE Description = 'test' COLLATE utf8_general_ci;
        SET i = i + 1;
        IF i = 30 THEN
          LEAVE theloop;
        END IF;
      END LOOP theloop;
    END;
    
    CREATE PROCEDURE benchmark_select_like()
    BEGIN
      DECLARE i INT DEFAULT 0;
      theloop: loop
        SELECT *
        FROM test
        WHERE Description LIKE '%test' COLLATE utf8_general_ci;
        SET i = i + 1;
        IF i = 30 THEN
          LEAVE theloop;
        END IF;
      END LOOP theloop;
    END;
    
    CREATE PROCEDURE benchmark_order_by()
    BEGIN
      DECLARE i INT DEFAULT 0;
      theloop: loop
        SELECT *
        FROM test
        WHERE ID > FLOOR(1 + RAND() * (400000 - 1))
        ORDER BY Description COLLATE utf8_general_ci LIMIT 1000;
        SET i = i + 1;
        IF i = 10 THEN
          LEAVE theloop;
        END IF;
      END LOOP theloop;
    END;
    

    In the stored procedures above utf8_general_ci collation is used, but of course during the tests I used both utf8_general_ci and utf8_unicode_ci.

    I called each stored procedure 5 times for each collation (5 times for utf8_general_ci and 5 times for utf8_unicode_ci) and then calculated the average values.

    My results are:

    benchmark_simple_select()

    • with utf8_general_ci: 9,957 ms
    • with utf8_unicode_ci: 10,271 ms

    In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 3.2%.

    benchmark_select_like()

    • with utf8_general_ci: 11,441 ms
    • with utf8_unicode_ci: 12,811 ms

    In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 12%.

    benchmark_order_by()

    • with utf8_general_ci: 11,944 ms
    • with utf8_unicode_ci: 12,887 ms

    In this benchmark using utf8_unicode_ci is slower than utf8_general_ci by 7.9%.

提交回复
热议问题