String comparison technique used by Python

前端 未结 7 1989
心在旅途
心在旅途 2020-11-21 15:50

I\'m wondering how Python does string comparison, more specifically how it determines the outcome when a less than (<) or greater than (>) op

7条回答
  •  别那么骄傲
    2020-11-21 16:10

    Take a look also at How do I sort unicode strings alphabetically in Python? where the discussion is about sorting rules given by the Unicode Collation Algorithm (http://www.unicode.org/reports/tr10/).

    To reply to the comment

    What? How else can ordering be defined other than left-to-right?

    by S.Lott, there is a famous counter-example when sorting French language. It involves accents: indeed, one could say that, in French, letters are sorted left-to-right and accents right-to-left. Here is the counter-example: we have e < é and o < ô, so you would expect the words cote, coté, côte, côté to be sorted as cote < coté < côte < côté. Well, this is not what happens, in fact you have: cote < côte < coté < côté, i.e., if we remove "c" and "t", we get oe < ôe < oé < ôé, which is exactly right-to-left ordering.

    And a last remark: you shouldn't be talking about left-to-right and right-to-left sorting but rather about forward and backward sorting.

    Indeed there are languages written from right to left and if you think Arabic and Hebrew are sorted right-to-left you may be right from a graphical point of view, but you are wrong on the logical level!

    Indeed, Unicode considers character strings encoded in logical order, and writing direction is a phenomenon occurring on the glyph level. In other words, even if in the word שלום the letter shin appears on the right of the lamed, logically it occurs before it. To sort this word one will first consider the shin, then the lamed, then the vav, then the mem, and this is forward ordering (although Hebrew is written right-to-left), while French accents are sorted backwards (although French is written left-to-right).

提交回复
热议问题