Why does filtering on a range match the wrong case when using a Case Sensitive collation?

坚强是说给别人听的谎言 提交于 2020-01-02 09:56:13

问题


SQL Server Standard 64 Bit with collation SQL_Latin1_General_CP1_CS_AS

Table plz:

ort varchar(30)    SQL_Latin1_General_CP1_CS_AS
select  ort,
from plz
where
   ort >= 'zürich' 
   and ort <= 'zürichz'

Selects this data:

Zürich
Zürich Mülligen
Zürich 80

Without the z at the end of second zürich no data are selected which is ok. But why does it show data on case sensitive server?


回答1:


When comparing strings, one of the first things that SQL Server does is to pad the shorter string with spaces so that the strings are of the same length. So basically you're query is trying to find any strings that match zürich and then the next character must be one that can appear between (from your first string) and z (from your second string) - which includes itself and most alphabetical characters.

This has nothing to do with case-sensitivity.

Without the z in your second string, the only strings that will match are those with the value zürich and just trailing spaces.




回答2:


There are two confusions happening here:

  • Equality vs Sorting

    When using the equality operator = the issue of case-sensitivity is more straight-forward and obvious (e.g. "z" <> "Z"). But when using the > and < operators sorting is required and this is separate from, though influenced by, case-sensitive vs case-insensitive. Sorting is determined first by which type of ordering (see next point) and then possibly determined by case-sensitivity.

  • Dictionary order vs Binary (i.e. ASCII value / Code Point) order

    Dictionary ordering, the default, means that a particular languages alphabetical ordering will be used. The particular language is part of the collation name, such as Latin1_General or Danish. This will put "a" (ASCII = 97) and "A" (ASCII = 65) together and "Z" (ASCII = 90) after both. A binary collation (i.e. one ending in _BIN or _BIN2) will use the binary character value, which will put "Z" between "A" and "a". A binary ordering is necessarily case-sensitive as it is everything-sensitive. A dictionary ordering will group "A" and "a" together, but it will only enforce that "A" comes before "a" if the collation is also case-sensitive. A case-insensitive dictionary ordering can intermix "A" and "a" values (see final SELECT in the example).

Now let's tie this all back to the question. The current collation is SQL_Latin1_General_CP1_CS_AS which is a Dictionary ordering. And when looking in a dictionary, both Zürich and zürich are going to be found back-to-back; Zürich would not come before zany simply because it has a capital Z.

Looking at the results returned by your query, the first entry, Zürich should not be there. I cannot get that to be returned. Since this is a case-sensitive collation, Zürich does come before zürich and would be filtered out by the WHERE clause. But the other two values, Zürich Mülligen and Zürich 80 are valid to be returned as they do match the WHERE clause. If you want them to not match, then you need to use a binary collation instead of the case-sensitive dictionary one.

To see this in action, run the following:

Dictionary (Case-InSensitive):

SELECT tmp.val
FROM (SELECT 'bat'
      UNION ALL
      SELECT 'bar'
      UNION ALL
      SELECT 'bad'
      UNION ALL
      SELECT 'Bar') tmp(val)
ORDER BY tmp.val COLLATE SQL_Latin1_General_CP1_CI_AS ASC;

Results:

bad
bar
Bar -- "Bar" typically, but not necessarily, comes after "bar"
bat

Dictionary (Case-Sensitive):

SELECT tmp.val
FROM (SELECT 'bat'
      UNION ALL
      SELECT 'bar'
      UNION ALL
      SELECT 'bad'
      UNION ALL
      SELECT 'Bar') tmp(val)
ORDER BY tmp.val COLLATE SQL_Latin1_General_CP1_CS_AS ASC;

Results:

bad
Bar -- "Bar" necessarily comes before "bar", but not before "bad, or even "b"
bar
bat

Binary:

SELECT tmp.val
FROM (SELECT 'bat'
      UNION ALL
      SELECT 'bar'
      UNION ALL
      SELECT 'bad'
      UNION ALL
      SELECT 'Bar') tmp(val)
ORDER BY tmp.val COLLATE Latin1_General_BIN2 ASC;

Results:

Bar -- "Bar" ("B" value = 66) necessarily comes before "b" (value = 98)
bad
bar
bat


To get a better sense of how case-sensitivity affects filtering and sorting in general, run the following step-by-step example.

DECLARE @test TABLE (string VARCHAR(30)
                         COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL);
INSERT INTO @test (string) values ('A');
INSERT INTO @test (string) values ('B');
INSERT INTO @test (string) values ('Y');
INSERT INTO @test (string) values ('Z');
INSERT INTO @test (string) values ('a');
INSERT INTO @test (string) values ('b');
INSERT INTO @test (string) values ('y');
INSERT INTO @test (string) values ('z');

-- case-sensitive, dictionary ordered sorting via column collation
SELECT [string] AS [Test1] FROM @test ORDER BY string ASC;
/* -- upper-case and lower-case of each letter are grouped together;
   -- upper-case consistently comes first due to case-sensitive
A
a
B
b
Y
y
Z
z
*/

-- case-sensitive comparison via column collation
SELECT [string] AS [Test2] FROM @test WHERE string = 'Y';
/* -- equality is straight-forward when using case-sensitive comparison
Y
*/

-- case-sensitive, dictionary ordered comparison and sorting via column collation
SELECT [string] AS [Test3] FROM @test WHERE string >= 'Y' ORDER BY string ASC;
/*  -- upper-case comes first due to case-sensitive
Y
y
Z
z
*/

-- case-sensitive, dictionary ordered comparison and sorting via column collation
SELECT [string] AS [Test4] FROM @test WHERE string >= 'y' ORDER BY string ASC;
/* -- upper-case comes first due to case-sensitive
y
Z
z
*/

-- case-insensitive, dictionary ordered comparison via collation override
SELECT [string] AS [Test5] FROM @test
WHERE string > 'Y' COLLATE SQL_Latin1_General_CP1_CI_AS;
/* -- upper-case and lower-case are the same due to case-INsensitive
Z
z
*/

-- binary ordering via collation override
SELECT [string] AS [Test6] FROM @test ORDER BY string COLLATE Latin1_General_BIN ASC;
/*
A
B
Y
Z
a
b
y
z
*/

-- case-sensitive, dictionary ordered comparison via column collation;
-- binary ordering via collation override
SELECT [string] AS [Test7] FROM @test WHERE string >= 'y'
   ORDER BY string COLLATE Latin1_General_BIN ASC;
/* -- lower-case 'y' comes before both 'Z' and 'z' when using a dictionary comparison
Z
y
z
*/

-- binary comparison via collation override;
-- binary ordering via collation override
SELECT [string] AS [Test8] FROM @test WHERE string >= 'y' COLLATE Latin1_General_BIN
   ORDER BY string COLLATE Latin1_General_BIN ASC;
/* -- lower-case 'y' comes after all capitals when using a binary comparison
y
z
*/

-- case-insensitive, dictionary ordered sorting via collation override
SELECT [string] AS [Test9] FROM @test
ORDER BY string COLLATE SQL_Latin1_General_CP1_CI_AS ASC;
/* -- upper-case and lower-case of each letter are grouped together,
   -- but inconsistent for upper-case vs lower-case first
A
a
b
B
Y
y
z
Z
*/

See the following links for some SQL Server collation info:

  • SQL Server Collation Name
  • Selecting a SQL Server Collation


来源:https://stackoverflow.com/questions/26176370/why-does-filtering-on-a-range-match-the-wrong-case-when-using-a-case-sensitive-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!