问题
TLDR: group_concat(utf8 varchar) union itself
returns only group_concat_max_len/3
ASCII characters as if the character length was fixed instead of variable.
The group_concat
alone returns group_concat_max_len
characters as expected.
The problem
I have a table tabletest
with a column data
defined as an UTF8 varchar(2048)
.
There is only a single row in the table with 1050 ASCII characters in the column.
A group_concat
over this table/column returns 1024 characters (equals group_concat_max_len
), which is expected.
But an union of this group_concat
with the same group_concat
returns 341 characters (equals group_concat_max_len / 3
).
Why does this happen?
According to MySQL Aggregate (GROUP BY) Function Descriptions:
The result type is TEXT or BLOB unless group_concat_max_len is less than or equal to 512, in which case the result type is VARCHAR or VARBINARY.
And from MySQL The BLOB and TEXT Types:
Similarly, you can regard a TEXT column as a VARCHAR column. BLOB and TEXT differ from VARBINARY and VARCHAR in the following ways:
- For indexes on BLOB and TEXT columns, you must specify an index prefix length. For CHAR and VARCHAR, a prefix length is optional. See Section 8.3.4, “Column Indexes”.
- BLOB and TEXT columns cannot have DEFAULT values.
So the return type should be TEXT
which is also variable-length and should support ASCII characters in utf8 as 1-byte wide.
Relevant answers acknowledging the problem
MySQL Truncating of result when using Group_Concat and Concat
Weird result for GROUP_CONCAT on subquery
Possible culprit
Again from MySQL The BLOB and TEXT Types:
Only the first max_sort_length bytes of the column are used when sorting. The default value of max_sort_length is 1024
AFAIK UNION
needs to sort rows before eliminating duplicates, so this seems to be a possible reason. But changing it by set max_sort_length=2048;
did not change the returned character count.
The only workaround seems to be SET group_concat_max_len = 1539;
or more. Just 1538 or less returns only 512 or less characters. Why this strange number?
Complete example
create database uniontest collate utf8_general_ci;
create table uniontest.tabletest (data varchar(2048));
insert into uniontest.tabletest select repeat('a',1050);
Simple select of the length of the 1050 characters:
select length(data) from uniontest.tabletest;
Outputs:
+--------------+
| length(data) |
+--------------+
| 1050 |
+--------------+
Group concat of the single line of 1050 characters (so no separators are added). In server configuration group_concat_max_len=1024
select length(group_concat(data separator ',')) from uniontest.tabletest;
Output is truncated as expected:
+------------------------------------------+
| length(group_concat(data separator ',')) |
+------------------------------------------+
| 1024 |
+------------------------------------------+
Now union with itself (in attempt to prevent additional datatype conversions):
select length(data) from (
select group_concat(data separator ',') as data from uniontest.tabletest union
select group_concat(data separator ',') as data from uniontest.tabletest) d;
Unexpected result (expecting 1024):
+--------------+
| length(data) |
+--------------+
| 341 |
+--------------+
Tested on MySQL 5.6 and 5.7.
EDIT
Found a bug report about ORDER BY
instead of UNION
ORDER BY truncates GROUP_CONCAT result. It is reported Closed, maybe only the ORDER BY
case was fixed?
来源:https://stackoverflow.com/questions/47733920/mysql-group-concatutf8-in-union-truncated-to-1024-3