I have a list of objects I wish to sort based on a field attr
of type string. I tried using -
list.sort(function (a, b) {
retur
I had been bothered about this for long, so I finally researched this and give you this long winded reason for why things are the way they are.
From the spec:
Section 11.9.4 The Strict Equals Operator ( === )
The production EqualityExpression : EqualityExpression === RelationalExpression
is evaluated as follows:
- Let lref be the result of evaluating EqualityExpression.
- Let lval be GetValue(lref).
- Let rref be the result of evaluating RelationalExpression.
- Let rval be GetValue(rref).
- Return the result of performing the strict equality comparison
rval === lval. (See 11.9.6)
So now we go to 11.9.6
11.9.6 The Strict Equality Comparison Algorithm
The comparison x === y, where x and y are values, produces true or false.
Such a comparison is performed as follows:
- If Type(x) is different from Type(y), return false.
- If Type(x) is Undefined, return true.
- If Type(x) is Null, return true.
- If Type(x) is Number, then
...
- If Type(x) is String, then return true if x and y are exactly the
same sequence of characters (same length and same characters in
corresponding positions); otherwise, return false.
That's it. The triple equals operator applied to strings returns true iff the arguments are exactly the same strings (same length and same characters in corresponding positions).
So ===
will work in the cases when we're trying to compare strings which might have arrived from different sources, but which we know will eventually have the same values - a common enough scenario for inline strings in our code. For example, if we have a variable named connection_state
, and we wish to know which one of the following states ['connecting', 'connected', 'disconnecting', 'disconnected']
is it in right now, we can directly use the ===
.
But there's more. Just above 11.9.4, there is a short note:
NOTE 4
Comparison of Strings uses a simple equality test on sequences of code
unit values. There is no attempt to use the more complex, semantically oriented
definitions of character or string equality and collating order defined in the
Unicode specification. Therefore Strings values that are canonically equal
according to the Unicode standard could test as unequal. In effect this
algorithm assumes that both Strings are already in normalized form.
Hmm. What now? Externally obtained strings can, and most likely will, be weird unicodey, and our gentle ===
won't do them justice. In comes localeCompare
to the rescue:
15.5.4.9 String.prototype.localeCompare (that)
...
The actual return values are implementation-defined to permit implementers
to encode additional information in the value, but the function is required
to define a total ordering on all Strings and to return 0 when comparing
Strings that are considered canonically equivalent by the Unicode standard.
We can go home now.
tl;dr;
To compare strings in javascript, use localeCompare
; if you know that the strings have no non-ASCII components because they are, for example, internal program constants, then ===
also works.
Nested ternary arrow function
(a,b) => (a < b ? -1 : a > b ? 1 : 0)
I was really annoyed about this string natural sorting order so I took quite some time to investigate this issue. I hope this helps.
localeCompare()
character support is badass, just use it.
As pointed out by Shog9
, the answer to your question is:
return item1.attr.localeCompare(item2.attr);
There are quite a bunch of custom implementations out there, trying to do string comparison more precisely called "natural string sort order"
When "playing" with these implementations, I always noticed some strange "natural sorting order" choice, or rather mistakes (or omissions in the best cases).
Typically, special characters (space, dash, ampersand, brackets, and so on) are not processed correctly.
You will then find them appearing mixed up in different places, typically that could be:
When one would have expected special characters to all be "grouped" together in one place, except for the space special character maybe (which would always be the first character). That is, either all before numbers, or all between numbers and letters (lowercase & uppercase being "together" one after another), or all after letters.
My conclusion is that they all fail to provide a consistent order when I start adding barely unusual characters (ie. characters with diacritics or charcters such as dash, exclamation mark and so on).
Research on the custom implementations:
Natural Compare Lite
https://github.com/litejs/natural-compare-lite : Fails at sorting consistently https://github.com/litejs/natural-compare-lite/issues/1 and http://jsbin.com/bevututodavi/1/edit?js,console , basic latin characters sorting http://jsbin.com/bevututodavi/5/edit?js,console Natural Sort
https://github.com/javve/natural-sort : Fails at sorting consistently, see issue https://github.com/javve/natural-sort/issues/7 and see basic latin characters sorting http://jsbin.com/cipimosedoqe/3/edit?js,console Javascript Natural Sort
https://github.com/overset/javascript-natural-sort : seems rather neglected since February 2012, Fails at sorting consistently, see issue https://github.com/overset/javascript-natural-sort/issues/16Alphanum
http://www.davekoelle.com/files/alphanum.js , Fails at sorting consistently, see http://jsbin.com/tuminoxifuyo/1/edit?js,consolelocaleCompare()
localeCompare()
oldest implementation (without the locales and options arguments) is supported by IE6+, see http://msdn.microsoft.com/en-us/library/ie/s4esdbwz(v=vs.94).aspx (scroll down to localeCompare() method).
The built-in localeCompare()
method does a much better job at sorting, even international & special characters.
The only problem using the localeCompare()
method is that "the locale and sort order used are entirely implementation dependent". In other words, when using localeCompare such as stringOne.localeCompare(stringTwo): Firefox, Safari, Chrome & IE have a different sort order for Strings.
Research on the browser-native implementations:
Implementing a solid algorithm (meaning: consistent but also covering a wide range of characters) is a very tough task. UTF8 contains more than 2000 characters & covers more than 120 scripts (languages). Finally, there are some specification for this tasks, it is called the "Unicode Collation Algorithm", which can be found at http://www.unicode.org/reports/tr10/ . You can find more information about this on this question I posted https://softwareengineering.stackexchange.com/questions/257286/is-there-any-language-agnostic-specification-for-string-natural-sorting-order
So considering the current level of support provided by the javascript custom implementations I came across, we will probably never see anything getting any close to supporting all this characters & scripts (languages). Hence I would rather use the browsers' native localeCompare() method. Yes, it does have the downside of beeing non-consistent across browsers but basic testing shows it covers a much wider range of characters, allowing solid & meaningful sort orders.
So as pointed out by Shog9
, the answer to your question is:
return item1.attr.localeCompare(item2.attr);
Thanks to Shog9's nice answer, which put me in the "right" direction I believe
since strings can be compared directly in javascript, this will do the job
list.sort(function (a, b) {
return a.attr > b.attr ? 1: -1;
})
the subtraction in a sort function is used only when non alphabetical (numerical) sort is desired and of course it does not work with strings
list.sort((a, b) => (a.attr > b.attr) - (a.attr < b.attr))
Or
list.sort((a, b) => +(a.attr > b.attr) || -(a.attr < b.attr))
Casting a boolean value to a number yields the following:
true
-> 1
false
-> 0
Consider three possible patterns:
(x > y) - (y < x)
-> 1 - 0
-> 1
(x > y) - (y < x)
-> 0 - 0
-> 0
(x > y) - (y < x)
-> 0 - 1
-> -1
(Alternative)
+(x > y) || -(x < y)
-> 1 || 0
-> 1
+(x > y) || -(x < y)
-> 0 || 0
-> 0
+(x > y) || -(x < y)
-> 0 || -1
-> -1
So these logics are equivalent to typical sort comparator functions.
if (x == y) {
return 0;
}
return x > y ? 1 : -1;
Use String.prototype.localeCompare a per your example:
list.sort(function (a, b) {
return ('' + a.attr).localeCompare(b.attr);
})
We force a.attr to be a string to avoid exceptions. localeCompare
has been supported since Internet Explorer 6 and Firefox 1. You may also see the following code used that doesn't respect a locale:
if (item1.attr < item2.attr)
return -1;
if ( item1.attr > item2.attr)
return 1;
return 0;