I have the following query:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x ([Country], [Language])
)
SELECT
[Country],
CASE COUNT([Language])
WHEN 1 THEN MAX([Language])
WHEN 2 THEN STRING_AGG([Language], ' and ')
ELSE STRING_AGG([Language], ', ')
END AS [Languages],
COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]
I was expecting the value inside Languages column for Switzerland to be comma separated i.e.:
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French, German, Italian, English | 4
Instead I am getting the below output (the 4 values are separated by and
):
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French and German and Italian and English | 4
What am I missing?
Here is another example:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
| y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b | a+b
Is this a bug in SQL Server?
Yes, this is a Bug (tm), present in versions up to SQL Server 2017 CU16 -- it's fixed in CU17, as well as Azure SQL Server and 2019 RC1. Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>)
identical as long as x
matches, no matter what <separator>
is, and unifies these with the first calculated expression in the query.
One workaround is to make sure x
does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
Don't repeat yourself*. You are repeating yourself by using MAX(...)
, LIST_AGG(...', ')
and LIST_AGG(...' and ')
. You could simply rewrite your query like this and might end up with a better plan:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x (Country, Language)
), results AS (
SELECT
Country,
COUNT(Language) AS LanguageCount,
STRING_AGG(Language, ', ') AS Languages
FROM cteCountryLanguageMapping
GROUP BY Country
)
SELECT Country, LanguageCount, CASE LanguageCount
WHEN 2 THEN REPLACE(Languages, ', ', ' and ')
ELSE Languages
END AS Languages_Fixed
FROM results
Result:
| Country | LanguageCount | Languages_Fixed |
|-------------|---------------|----------------------------------|
| Spain | 2 | Spanish and English |
| Sweden | 1 | English |
| Switzerland | 4 | French, German, Italian, English |
* I did not want to repeat others as well by saying that it is a bug.
来源:https://stackoverflow.com/questions/52533487/string-agg-not-behaving-as-expected