Could string comparisons really differ based on culture when the string is guaranteed not to change?

心已入冬 提交于 2019-11-28 18:32:14

Absolutely. Per MSDN (http://msdn.microsoft.com/en-us/library/d93tkzah.aspx),

This method performs a word (case-sensitive and culture-sensitive) search using the current culture.

So you may get different results if you run it under a different culture (via regional and language settings in Control Panel).

In this particular case, you probably won't have a problem, but throw an i in the search string and run it in Turkey and it will probably ruin your day.

See MSDN: http://msdn.microsoft.com/en-us/library/ms973919.aspx

These new recommendations and APIs exist to alleviate misguided assumptions about the behavior of default string APIs. The canonical example of bugs emerging where non-linguistic string data is interpreted linguistically is the "Turkish-I" problem.

For nearly all Latin alphabets, including U.S. English, the character i (\u0069) is the lowercase version of the character I (\u0049). This casing rule quickly becomes the default for someone programming in such a culture. However, in Turkish ("tr-TR"), there exists a capital "i with a dot," character (\u0130), which is the capital version of i. Similarly, in Turkish, there is a lowercase "i without a dot," or (\u0131), which capitalizes to I. This behavior occurs in the Azeri culture ("az") as well.

Therefore, assumptions normally made about capitalizing i or lowercasing I are not valid among all cultures. If the default overloads for string comparison routines are used, they will be subject to variance between cultures. For non-linguistic data, as in the following example, this can produce undesired results:

    Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US")
Console.WriteLine("Culture = {0}",
   Thread.CurrentThread.CurrentCulture.DisplayName);
Console.WriteLine("(file == FILE) = {0}", 
   (String.Compare("file", "FILE", true) == 0));

Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
Console.WriteLine("Culture = {0}",
   Thread.CurrentThread.CurrentCulture.DisplayName);
Console.WriteLine("(file == FILE) = {0}", 
   (String.Compare("file", "FILE", true) == 0));

Because of the difference of the comparison of I, results of the comparisons change when the thread culture is changed. This is the output:

Culture = English (United States)
(file == FILE) = True
Culture = Turkish (Turkey)
(file == FILE) = False

Here is an example without case:

var s1 = "é"; //é as one character (ALT+0233)
var s2 = "é"; //'e', plus combining acute accent U+301 (two characters)

Console.WriteLine(s1.IndexOf(s2, StringComparison.Ordinal)); //-1
Console.WriteLine(s1.IndexOf(s2, StringComparison.InvariantCulture)); //0
Console.WriteLine(s1.IndexOf(s2, StringComparison.CurrentCulture)); //0

CA1309: UseOrdinalStringComparison

It doesn't hurt to not use it, but "by explicitly setting the parameter to either the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, your code often gains speed, increases correctness, and becomes more reliable.".


What exactly is Ordinal, and why does it matter to your case?

An operation that uses ordinal sort rules performs a comparison based on the numeric value (Unicode code point) of each Char in the string. An ordinal comparison is fast but culture-insensitive. When you use ordinal sort rules to sort strings that start with Unicode characters (U+), the string U+xxxx comes before the string U+yyyy if the value of xxxx is numerically less than yyyy.

And, as you stated... the string value you are reading in is not culture sensitive, so it makes sense to use an Ordinal comparison as opposed to a Word comparison. Just remember, Ordinal means "this isn't culture sensitive".

To answer your specific question: No, but a static analysis tool is not going to be able to realize that your input value will never have locale-specific information in it.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!